Search Results: "ascii"

5 December 2021

Paul Tagliamonte: Framing data (Part 4/5)

This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.
In the last post, we we were able to build a functioning Layer 1 PHY where we can encode symbols to transmit, and receive symbols on the other end, we re now at the point where we can encode and decode those symbols as bits and frame blocks of data, marking them with a Sender and a Destination for routing to the right host(s). This is a Layer 2 scheme in the OSI model, which is otherwise known as the Data Link Layer. You re using one to view this website right now I m willing to bet your data is going through an Ethernet layer 2 as well as WiFi or maybe a cellular data protocol like 5G or LTE. Given that this entire exercise is hard enough without designing a complex Layer 2 scheme, I opted for simplicity in the hopes this would free me from the complexity and research that has gone into this field for the last 50 years. I settled on stealing a few ideas from Ethernet Frames namely, the use of MAC addresses to identify parties, and the EtherType field to indicate the Payload type. I also stole the idea of using a CRC at the end of the Frame to check for corruption, as well as the specific CRC method (crc32 using 0xedb88320 as the polynomial). Lastly, I added a callsign field to make life easier on ham radio frequencies if I was ever to seriously attempt to use a variant of this protocol over the air with multiple users. However, given this scheme is not a commonly used scheme, it s best practice to use a nearby radio to identify your transmissions on the same frequency while testing or use a Faraday box to test without transmitting over the airwaves. I added the callsign field in an effort to lean into the spirit of the Part 97 regulations, even if I relied on a phone emission to identify the Frames. As an aside, I asked the ARRL for input here, and their stance to me over email was I d be OK according to the regs if I were to stick to UHF and put my callsign into the BPSK stream using a widely understood encoding (even with no knowledge of PACKRAT, the callsign is ASCII over BPSK and should be easily demodulatable for followup with me). Even with all this, I opted to use FM phone to transmit my callsign when I was active on the air (specifically, using an SDR and a small bash script to automate transmission while I watched for interference or other band users). Right, back to the Frame:
sync
dest
source
callsign
type
payload
crc
With all that done, I put that layout into a struct, so that we can marshal and unmarshal bytes to and from our Frame objects, and work with it in software.
type FrameType [2]byte
type Frame struct  
Destination net.HardwareAddr
Source net.HardwareAddr
Callsign [8]byte
Type FrameType
Payload []byte
CRC uint32
 

Time to pick some consts I picked a unique and distinctive sync sequence, which the sender will transmit before the Frame, while the receiver listens for that sequence to know when it s in byte alignment with the symbol stream. My sync sequence is [3]byte 'U', 'f', '~' which works out to be a very pleasant bit sequence of 01010101 01100110 01111110. It s important to have soothing preambles for your Frames. We need all the good energy we can get at this point.
var (
FrameStart = [3]byte 'U', 'f', '~' 
FrameMaxPayloadSize = 1500
)
Next, I defined some FrameType values for the type field, which I can use to determine what is done with that data next, something Ethernet was originally missing, but has since grown to depend on (who needs Length anyway? Not me. See below!)
FrameType Description Bytes
Raw Bytes in the Payload field are opaque and not to be parsed. [2]byte 0x00, 0x01
IPv4 Bytes in the Payload field are an IPv4 packet. [2]byte 0x00, 0x02
And finally, I decided on a maximum length of the Payload, and decided on limiting it to 1500 bytes to align with the MTU of Ethernet.
var (
FrameTypeRaw = FrameType 0, 1 
FrameTypeIPv4 = FrameType 0, 2 
)
Given we know how we re going to marshal and unmarshal binary data to and from Frames, we can now move on to looking through the bit stream for our Frames.

Why is there no Length field? I was initially a bit surprised that Ethernet Frames didn t have a Length field in use, but the more I thought about it, the more it seemed like a big ole' failure mode without a good implementation outcome. Either the Length is right (resulting in no action and used bits on every packet) or the Length is not the length of the Payload and the driver needs to determine what to do with the packet does it try and trim the overlong payload and ignore the rest? What if both the end of the read bytes and the end of the subset of the packet denoted by Length have a valid CRC? Which is used? Will everyone agree? What if Length is longer than the Payload but the CRC is good where we detected a lost carrer? I decided on simplicity. The end of a Frame is denoted by the loss of the BPSK carrier when the signal is no longer being transmitted (or more correctly, when the signal is no longer received), we know we ve hit the end of a packet. Missing a single symbol will result in the Frame being finalized. This can cause some degree of corruption, but it s also a lot easier than doing tricks like bit stuffing to create an end of symbol stream delimiter.

Finding the Frame start in a Symbol Stream First thing we need to do is find our sync bit pattern in the symbols we re receiving from our BPSK demodulator. There s some smart ways to do this, but given that I m not much of a smart man, I again decided to go for simple instead. Given our incoming vector of symbols (which are still float values) prepend one at a time to a vector of floats that is the same length as the sync phrase, and compare against the sync phrase, to determine if we re in sync with the byte boundary within the symbol stream. The only trick here is that because we re using BPSK to modulate and demodulate the data, post phaselock we can be 180 degrees out of alignment (such that a +1 is demodulated as -1, or vice versa). To deal with that, I check against both the sync phrase as well as the inverse of the sync phrase (both [1, -1, 1] as well as [-1, 1, -1]) where if the inverse sync is matched, all symbols to follow will be inverted as well. This effectively turns our symbols back into bits, even if we re flipped out of phase. Other techniques like NRZI will represent a 0 or 1 by a change in phase state which is great, but can often cascade into long runs of bit errors, and is generally more complex to implement. That representation isn t ambiguous, given you look for a phase change, not the absolute phase value, which is incredibly compelling. Here s a notional example of how I ve been thinking about the phrase sliding window and how I ve been thinking of the checks. Each row is a new symbol taken from the BPSK receiver, and pushed to the head of the sliding window, moving all symbols back in the vector by one.
 var (
sync = []float  ...  
buf = make([]float, len(sync))
incomingSymbols = []float  ...  
)
for _, el := range incomingSymbols  
copy(buf, buf[1:])
buf[len(buf)-1] = el
if compare(sync, buf)  
// we're synced!
 break
 
 
Given the pseudocode above, let s step through what the checks would be doing at each step:
Buffer Sync Inverse Sync
[ ]float 0, ,0 [ ]float -1, ,-1 [ ]float 1, ,1
[ ]float 0, ,1 [ ]float -1, ,-1 [ ]float 1, ,1
[more bits in] [ ]float -1, ,-1 [ ]float 1, ,1
[ ]float 1, ,1 [ ]float -1, ,-1 [ ]float 1, ,1
After this notional set of comparisons, we know that at the last step, we are now aligned to the frame and byte boundary the next symbol / bit will be the MSB of the 0th Frame byte. Additionally, we know we re also 180 degrees out of phase, so we need to flip the symbol s sign to get the bit. From this point on we can consume 8 bits at a time, and re-assemble the byte stream. I don t know what this technique is called or even if this is used in real grown-up implementations, but it s been working for my toy implementation.

Next Steps Now that we can read/write Frames to and from PACKRAT, the next steps here are going to be implementing code to encode and decode Ethernet traffic into PACKRAT, coming next in Part 5!

18 October 2021

Dirk Eddelbuettel: dang 0.0.14: Several Updates

A new release of the dang package arrived at CRAN a couple of hours ago, exactly eight months after the previous release. The dang package regroups a few functions of mine that had no other home as for example lsos() from a StackOverflow question from 2009 (!!), the overbought/oversold price band plotter from an older blog post, the market monitor from the last release as well the checkCRANStatus() function recently tweeted about by Tim Taylor. This release regroups a few small edits to several functions, adds a sample function for character encoding reading and conversion using a library already used by R (hence look Ma, no new depends ), adds a weekday helper, and a sample usage (computing rolling min/max values) of a new simple vector class added to tidyCpp (and the function and class need to get another blog post or study ), and an experimental git sha1sum and date marker (as I am not the fan of autogenerated binaries from repos as opposed to marked released meaning: we may see different binary release with the same version number). The full NEWS entry follows.

Changes in version 0.0.14 (2021-10-17)
  • Updated continuous integration to run on Linux only.
  • Edited checkNonAscii.cpp for readability.
  • More robust title display in intradayMarketMonitor.R.
  • New C++-based function to read and convert encoding via the R-supplied iconv library, noted a potential variability.
  • New function wday returning day of the week as integer.
  • The signature to as.data.table was standardized.
  • A new function rollMinMax was added illustrating use of the NumVec class from tidyCpp.
  • The configure script can record the last commit date and sha1 to automate timestamping builds, but not activated in this release.
  • checkCRANStatus() now works correctly for single-package lookups (Jordan Mark Barbone in #4).

Courtesy of my CRANberries, there is a comparison to the previous release. For questions or comments use the issue tracker off the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

2 August 2021

Colin Watson: Launchpad now runs on Python 3!

After a very long porting journey, Launchpad is finally running on Python 3 across all of our systems. I wanted to take a bit of time to reflect on why my emotional responses to this port differ so much from those of some others who ve done large ports, such as the Mercurial maintainers. It s hard to deny that we ve had to burn a lot of time on this, which I m sure has had an opportunity cost, and from one point of view it s essentially running to stand still: there is no single compelling feature that we get solely by porting to Python 3, although it s clearly a prerequisite for tidying up old compatibility code and being able to use modern language facilities in the future. And yet, on the whole, I found this a rewarding project and enjoyed doing it. Some of this may be because by inclination I m a maintenance programmer and actually enjoy this sort of thing. My default view tends to be that software version upgrades may be a pain but it s much better to get that pain over with as soon as you can rather than trying to hold back the tide; you can certainly get involved and try to shape where things end up, but rightly or wrongly I can t think of many cases when a righteously indignant user base managed to arrange for the old version to be maintained in perpetuity so that they never had to deal with the new thing (OK, maybe Perl 5 counts here). I think a more compelling difference between Launchpad and Mercurial, though, may be that very few other people really had a vested interest in what Python version Launchpad happened to be running, because it s all server-side code (aside from some client libraries such as launchpadlib, which were ported years ago). As such, we weren t trying to do this with the internet having Strong Opinions at us. We were doing this because it was obviously the only long-term-maintainable path forward, and in more recent times because some of our library dependencies were starting to drop support for Python 2 and so it was obviously going to become a practical problem for us sooner or later; but if we d just stayed on Python 2 forever then fundamentally hardly anyone else would really have cared directly, only maybe about some indirect consequences of that. I don t follow Mercurial development so I may be entirely off-base, but if other people were yelling at me about how late my project was to finish its port, that in itself would make me feel more negatively about the project even if I thought it was a good idea. Having most of the pressure come from ourselves rather than from outside meant that wasn t an issue for us. I m somewhat inclined to think of the process as an extreme version of paying down technical debt. Moving from Python 2.7 to 3.5, as we just did, means skipping over multiple language versions in one go, and if similar changes had been made more gradually it would probably have felt a lot more like the typical dependency update treadmill. I appreciate why not everyone might want to think of it this way: maybe this is just my own rationalization. Reflections on porting to Python 3 I m not going to defend the Python 3 migration process; it was pretty rough in a lot of ways. Nor am I going to spend much effort relitigating it here, as it s already been done to death elsewhere, and as I understand it the core Python developers have got the message loud and clear by now. At a bare minimum, a lot of valuable time was lost early in Python 3 s lifetime hanging on to flag-day-type porting strategies that were impractical for large projects, when it should have been providing for bilingual strategies (code that runs in both Python 2 and 3 for a transitional period) which is where most libraries and most large migrations ended up in practice. For instance, the early advice to library maintainers to maintain two parallel versions or perhaps translate dynamically with 2to3 was entirely impractical in most non-trivial cases and wasn t what most people ended up doing, and yet the idea that 2to3 is all you need still floats around Stack Overflow and the like as a result. (These days, I would probably point people towards something more like Eevee s porting FAQ as somewhere to start.) There are various fairly straightforward things that people often suggest could have been done to smooth the path, and I largely agree: not removing the u'' string prefix only to put it back in 3.3, fewer gratuitous compatibility breaks in the name of tidiness, and so on. But if I had a time machine, the number one thing I would ask to have been done differently would be introducing type annotations in Python 2 before Python 3 branched off. It s true that it s technically possible to do type annotations in Python 2, but the fact that it s a different syntax that would have to be fixed later is offputting, and in practice it wasn t widely used in Python 2 code. To make a significant difference to the ease of porting, annotations would need to have been introduced early enough that lots of Python 2 library code used them so that porting code didn t have to be quite so much of an exercise of manually figuring out the exact nature of string types from context. Launchpad is a complex piece of software that interacts with multiple domains: for example, it deals with a database, HTTP, web page rendering, Debian-format archive publishing, and multiple revision control systems, and there s often overlap between domains. Each of these tends to imply different kinds of string handling. Web page rendering is normally done mainly in Unicode, converting to bytes as late as possible; revision control systems normally want to spend most of their time working with bytes, although the exact details vary; HTTP is of course bytes on the wire, but Python s WSGI interface has some string type subtleties. In practice I found myself thinking about at least four string-like types (that is, things that in a language with a stricter type system I might well want to define as distinct types and restrict conversion between them): bytes, text, ordinary native strings (str in either language, encoded to UTF-8 in Python 2), and native strings with WSGI s encoding rules. Some of these are emergent properties of writing in the intersection of Python 2 and 3, which is effectively a specialized language of its own without coherent official documentation whose users must intuit its behaviour by comparing multiple sources of information, or by referring to unofficial porting guides: not a very satisfactory situation. Fortunately much of the complexity collapses once it becomes possible to write solely in Python 3. Some of the difficulties we ran into are not ones that are typically thought of as Python 2-to-3 porting issues, because they were changed later in Python 3 s development process. For instance, the email module was substantially improved in around the 3.2/3.3 timeframe to handle Python 3 s bytes/text model more correctly, and since Launchpad sends quite a few different kinds of email messages and has some quite picky tests for exactly what it emits, this entailed a lot of work in our email sending code and in our test suite to account for that. (It took me a while to work out whether we should be treating raw email messages as bytes or as text; bytes turned out to work best.) 3.4 made some tweaks to the implementation of quoted-printable encoding that broke a number of our tests in ways that took some effort to fix, because the tests needed to work on both 2.7 and 3.5. The list goes on. I got quite proficient at digging through Python s git history to figure out when and why some particular bit of behaviour had changed. One of the thorniest problems was parsing HTTP form data. We mainly rely on zope.publisher for this, which in turn relied on cgi.FieldStorage; but cgi.FieldStorage is badly broken in some situations on Python 3. Even if that bug were fixed in a more recent version of Python, we can t easily use anything newer than 3.5 for the first stage of our port due to the version of the base OS we re currently running, so it wouldn t help much. In the end I fixed some minor issues in the multipart module (and was kindly given co-maintenance of it) and converted zope.publisher to use it. Although this took a while to sort out, it seems to have gone very well. A couple of other interesting late-arriving issues were around pickle. For most things we normally prefer safer formats such as JSON, but there are a few cases where we use pickle, particularly for our session databases. One of my colleagues pointed out that I needed to remember to tell pickle to stick to protocol 2, so that we d be able to switch back and forward between Python 2 and 3 for a while; quite right, and we later ran into a similar problem with marshal too. A more surprising problem was that datetime.datetime objects pickled on Python 2 require special care when unpickling on Python 3; rather than the approach that ended up being implemented and documented for Python 3.6, though, I preferred a custom unpickler, both so that things would work on Python 3.5 and so that I wouldn t have to risk affecting the decoding of other pickled strings in the session database. General lessons Writing this over a year after Python 2 s end-of-life date, and certainly nowhere near the leading edge of Python 3 porting work, it s perhaps more useful to look at this in terms of the lessons it has for other large technical debt projects. I mentioned in my previous article that I used the approach of an enormous and frequently-rebased git branch as a working area for the port, committing often and sometimes combining and extracting commits for review once they seemed to be ready. A port of this scale would have been entirely intractable without a tool of similar power to git rebase, so I m very glad that we finished migrating to git in 2019. I relied on this right up to the end of the port, and it also allowed for quick assessments of how much more there was to land. git worktree was also helpful, in that I could easily maintain working trees built for each of Python 2 and 3 for comparison. As is usual for most multi-developer projects, all changes to Launchpad need to go through code review, although we sometimes make exceptions for very simple and obvious changes that can be self-reviewed. Since I knew from the outset that this was going to generate a lot of changes for review, I therefore structured my work from the outset to try to make it as easy as possible for my colleagues to review it. This generally involved keeping most changes to a somewhat manageable size of 800 lines or less (although this wasn t always possible), and arranging commits mainly according to the kind of change they made rather than their location. For example, when I needed to fix issues with / in Python 3 being true division rather than floor division, I did so in one commit across the various places where it mattered and took care not to mix it with other unrelated changes. This is good practice for nearly any kind of development, but it was especially important here since it allowed reviewers to consider a clear explanation of what I was doing in the commit message and then skim-read the rest of it much more quickly. It was vital to keep the codebase in a working state at all times, and deploy to production reasonably often: this way if something went wrong the amount of code we had to debug to figure out what had happened was always tractable. (Although I can t seem to find it now to link to it, I saw an account a while back of a company that had taken a flag-day approach instead with a large codebase. It seemed to work for them, but I m certain we couldn t have made it work for Launchpad.) I can t speak too highly of Launchpad s test suite, much of which originated before my time. Without a great deal of extensive coverage of all sorts of interesting edge cases at both the unit and functional level, and a corresponding culture of maintaining that test suite well when making new changes, it would have been impossible to be anything like as confident of the port as we were. As part of the porting work, we split out a couple of substantial chunks of the Launchpad codebase that could easily be decoupled from the core: its Mailman integration and its code import worker. Both of these had substantial dependencies with complex requirements for porting to Python 3, and arranging to be able to do these separately on their own schedule was absolutely worth it. Like disentangling balls of wool, any opportunity you can take to make things less tightly-coupled is probably going to make it easier to disentangle the rest. (I can see a tractable way forward to porting the code import worker, so we may well get that done soon. Our Mailman integration will need to be rewritten, though, since it currently depends on the Python-2-only Mailman 2, and Mailman 3 has a different architecture.) Python lessons Our database layer was already in pretty good shape for a port, since at least the modern bits of its table modelling interface were already strict about using Unicode for text columns. If you have any kind of pervasive low-level framework like this, then making it be pedantic at you in advance of a Python 3 port will probably incur much less swearing in the long run, as you won t be trying to deal with quite so many bytes/text issues at the same time as everything else. Early in our port, we established a standard set of __future__ imports and started incrementally converting files over to them, mainly because we weren t yet sure what else to do and it seemed likely to be helpful. absolute_import was definitely reasonable (and not often a problem in our code), and print_function was annoying but necessary. In hindsight I m not sure about unicode_literals, though. For files that only deal with bytes and text it was reasonable enough, but as I mentioned above there were also a number of cases where we needed literals of the language s native str type, i.e. bytes in Python 2 and text in Python 3: this was particularly noticeable in WSGI contexts, but also cropped up in some other surprising places. We generally either omitted unicode_literals or used six.ensure_str in such cases, but it was definitely a bit awkward and maybe I should have listened more to people telling me it might be a bad idea. A lot of Launchpad s early tests used doctest, mainly in the style where you have text files that interleave narrative commentary with examples. The development team later reached consensus that this was best avoided in most cases, but by then there were far too many doctests to conveniently rewrite in some other form. Porting doctests to Python 3 is really annoying. You run into all the little changes in how objects are represented as text (particularly u'...' versus '...', but plenty of other cases as well); you have next to no tools to do anything useful like skipping individual bits of a doctest that don t apply; using __future__ imports requires the rather obscure approach of adding the relevant names to the doctest s globals in the relevant DocFileSuite or DocTestSuite; dealing with many exception tracebacks requires something like zope.testing.renormalizing; and whatever code refactoring tools you re using probably don t work properly. Basically, don t have done that. It did all turn out to be tractable for us in the end, and I managed to avoid using much in the way of fragile doctest extensions aside from the aforementioned zope.testing.renormalizing, but it was not an enjoyable experience. Regressions I know of nine regressions that reached Launchpad s production systems as a result of this porting work; of course there were various other regressions caught by CI or in manual testing. (Considering the size of this project, I count it as a resounding success that there were only nine production issues, and that for the most part we were able to fix them quickly.) Equality testing of removed database objects One of the things we had to do while porting to Python 3 was to implement the __eq__, __ne__, and __hash__ special methods for all our database objects. This was quite conceptually fiddly, because doing this requires knowing each object s primary key, and that may not yet be available if we ve created an object in Python but not yet flushed the actual INSERT statement to the database (most of our primary keys are auto-incrementing sequences). We thus had to take care to flush pending SQL statements in such cases in order to ensure that we know the primary keys. However, it s possible to have a problem at the other end of the object lifecycle: that is, a Python object might still be reachable in memory even though the underlying row has been DELETEd from the database. In most cases we don t keep removed objects around for obvious reasons, but it can happen in caching code, and buildd-manager crashed as a result (in fact while it was still running on Python 2). We had to take extra care to avoid this problem. Debian imports crashed on non-UTF-8 filenames Python 2 has some unfortunate behaviour around passing bytes or Unicode strings (depending on the platform) to shutil.rmtree, and the combination of some porting work and a particular source package in Debian that contained a non-UTF-8 file name caused us to run into this. The fix was to ensure that the argument passed to shutil.rmtree is a str regardless of Python version. We d actually run into something similar before: it s a subtle porting gotcha, since it s quite easy to end up passing Unicode strings to shutil.rmtree if you re in the process of porting your code to Python 3, and you might easily not notice if the file names in your tests are all encoded using UTF-8. lazr.restful ETags We eventually got far enough along that we could switch one of our four appserver machines (we have quite a number of other machines too, but the appservers handle web and API requests) to Python 3 and see what happened. By this point our extensive test suite had shaken out the vast majority of the things that could go wrong, but there was always going to be room for some interesting edge cases. One of the Ubuntu kernel team reported that they were seeing an increase in 412 Precondition Failed errors in some of their scripts that use our webservice API. These can happen when you re trying to modify an existing resource: the underlying protocol involves sending an If-Match header with the ETag that the client thinks the resource has, and if this doesn t match the ETag that the server calculates for the resource then the client has to refresh its copy of the resource and try again. We initially thought that this might be legitimate since it can happen in normal operation if you collide with another client making changes to the same resource, but it soon became clear that something stranger was going on: we were getting inconsistent ETags for the same object even when it was unchanged. Since we d recently switched a quarter of our appservers to Python 3, that was a natural suspect. Our lazr.restful package provides the framework for our webservice API, and roughly speaking it generates ETags by serializing objects into some kind of canonical form and hashing the result. Unfortunately the serialization was dependent on the Python version in a few ways, and in particular it serialized lists of strings such as lists of bug tags differently: Python 2 used [u'foo', u'bar', u'baz'] where Python 3 used ['foo', 'bar', 'baz']. In lazr.restful 1.0.3 we switched to using JSON for this, removing the Python version dependency and ensuring consistent behaviour between appservers. Memory leaks This problem took the longest to solve. We noticed fairly quickly from our graphs that the appserver machine we d switched to Python 3 had a serious memory leak. Our appservers had always been a bit leaky, but now it wasn t so much a small hole that we can bail occasionally as the boat is sinking rapidly : A serious memory leak (Yes, this got in the way of working out what was going on with ETags for a while.) I spent ages messing around with various attempts to fix this. Since only a quarter of our appservers were affected, and we could get by on 75% capacity for a while, it wasn t urgent but it was definitely annoying. After spending some quality time with objgraph, for some time I thought traceback reference cycles might be at fault, and I sent a number of fixes to various upstream projects for those (e.g. zope.pagetemplate). Those didn t help the leaks much though, and after a while it became clear to me that this couldn t be the sole problem: Python has a cyclic garbage collector that will eventually collect reference cycles as long as there are no strong references to any objects in them, although it might not happen very quickly. Something else must be going on. Debugging reference leaks in any non-trivial and long-running Python program is extremely arduous, especially with ORMs that naturally tend to end up with lots of cycles and caches. After a while I formed a hypothesis that zope.server might be keeping a strong reference to something, although I never managed to nail it down more firmly than that. This was an attractive theory as we were already in the process of migrating to Gunicorn for other reasons anyway, and Gunicorn also has a convenient max_requests setting that s good at mitigating memory leaks. Getting this all in place took some time, but once we did we found that everything was much more stable: A rather flat memory graph This isn t completely satisfying as we never quite got to the bottom of the leak itself, and it s entirely possible that we ve only papered over it using max_requests: I expect we ll gradually back off on how frequently we restart workers over time to try to track this down. However, pragmatically, it s no longer an operational concern. Mirror prober HTTPS proxy handling After we switched our script servers to Python 3, we had several reports of mirror probing failures. (Launchpad keeps lists of Ubuntu archive and image mirrors, and probes them every so often to check that they re reasonably complete and up to date.) This only affected HTTPS mirrors when probed via a proxy server, support for which is a relatively recent feature in Launchpad and involved some code that we never managed to unit-test properly: of course this is exactly the code that went wrong. Sadly I wasn t able to sort out that gap, but at least the fix was simple. Non-MIME-encoded email headers As I mentioned above, there were substantial changes in the email package between Python 2 and 3, and indeed between minor versions of Python 3. Our test coverage here is pretty good, but it s an area where it s very easy to have gaps. We noticed that a script that processes incoming email was crashing on messages with headers that were non-ASCII but not MIME-encoded (and indeed then crashing again when it tried to send a notification of the crash!). The only examples of these I looked at were spam, but we still didn t want to crash on them. The fix involved being somewhat more careful about both the handling of headers returned by Python s email parser and the building of outgoing email notifications. This seems to be working well so far, although I wouldn t be surprised to find the odd other incorrect detail in this sort of area. Failure to handle non-ISO-8859-1 URL-encoded form input Remember how I said that parsing HTTP form data was thorny? After we finished upgrading all our appservers to Python 3, people started reporting that they couldn t post Unicode comments to bugs, which turned out to be only if the attempt was made using JavaScript, and was because I hadn t quite managed to get URL-encoded form data working properly with zope.publisher and multipart. The current standard describes the URL-encoded format for form data as in many ways an aberrant monstrosity , so this was no great surprise. Part of the problem was some very strange choices in zope.publisher dating back to 2004 or earlier, which I attempted to clean up and simplify. The rest was that Python 2 s urlparse.parse_qs unconditionally decodes percent-encoded sequences as ISO-8859-1 if they re passed in as part of a Unicode string, so multipart needs to work around this on Python 2. I m still not completely confident that this is correct in all situations, but at least now that we re on Python 3 everywhere the matrix of cases we need to care about is smaller. Inconsistent marshalling of Loggerhead s disk cache We use Loggerhead for providing web browsing of Bazaar branches. When we upgraded one of its two servers to Python 3, we immediately noticed that the one still on Python 2 was failing to read back its revision information cache, which it stores in a database on disk. (We noticed this because it caused a deployment to fail: when we tried to roll out new code to the instance still on Python 2, Nagios checks had already caused an incompatible cache to be written for one branch from the Python 3 instance.) This turned out to be a similar problem to the pickle issue mentioned above, except this one was with marshal, which I didn t think to look for because it s a relatively obscure module mostly used for internal purposes by Python itself; I m not sure that Loggerhead should really be using it in the first place. The fix was relatively straightforward, complicated mainly by now needing to cope with throwing away unreadable cache data. Ironically, if we d just gone ahead and taken the nominally riskier path of upgrading both servers at the same time, we might never have had a problem here. Intermittent bzr failures Finally, after we upgraded one of our two Bazaar codehosting servers to Python 3, we had a report of intermittent bzr branch hangs. After some digging I found this in our logs:
Traceback (most recent call last):
  ...
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/conch/ssh/channel.py", line 136, in addWindowBytes
    self.startWriting()
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/lazr/sshserver/session.py", line 88, in startWriting
    resumeProducing()
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/internet/process.py", line 894, in resumeProducing
    for p in self.pipes.itervalues():
builtins.AttributeError: 'dict' object has no attribute 'itervalues'
I d seen this before in our git hosting service: it was a bug in Twisted s Python 3 port, fixed after 20.3.0 but unfortunately after the last release that supported Python 2, so we had to backport that patch. Using the same backport dealt with this. Onwards!

4 June 2021

Matthew Garrett: Mike Lindell's Cyber "Evidence"

Mike Lindell, notable for absolutely nothing relevant in this field, today filed a lawsuit against a couple of voting machine manufacturers in response to them suing him for defamation after he claimed that they were covering up hacks that had altered the course of the US election. Paragraph 104 of his suit asserts that he has evidence of at least 20 documented hacks, including the number of votes that were changed. The citation is just a link to a video called Absolute 9-0, which claims to present sufficient evidence that the US supreme court will come to a 9-0 decision that the election was tampered with.

The claim is that Lindell was provided with a set of files on the 9th of January, and gave these to some cyber experts to verify. These experts identified them as packet captures. The video contains scrolling hex, and we are told that this is the raw encrypted data from the files. In reality, the hex values correspond very clearly to printable ASCII, and appear to just be the Pennsylvania voter roll. They're not encrypted, and they're not packet captures (they contain no packet headers).

20 of these packet captures were then selected and analysed, giving us the tables contained within Exhibit 12. The alleged source IPs appear to correspond to the networks the tables claim, and the latitude and longitude presumably just come from a geoip lookup of some sort (although clearly those values are far too precise to be accurate). But if we look at the target IPs, we find something interesting. Most of them resolve to the website for the county that was the nominal target (eg, 198.108.253.104 is www.deltacountymi.org). So, we're supposed to believe that in many cases, the county voting infrastructure was hosted on the county website.

Unfortunately we're not given the destination port, but 198.108.253.104 isn't listening on anything other than 80 and 443. We're told that the packet data is encrypted, so presumably it's over HTTPS. So, uh, how did they decrypt this to figure out how many votes were switched? If Mike's hackers have broken TLS, they really don't need to be dealing with this.

We're also given some background information on how it's impossible to reconstruct packet captures after the fact (untrue), or that modifying them would change their hashes (true, but in the absence of known good hash values that tells us nothing), but it's pretty clear that nothing we're shown actually demonstrates what we're told it does.

In summary: yes, any supreme court decision on this would be 9-0, just not the way he's hoping for.

Update: It was pointed out that this data appears to be part of a larger dataset. This one is even more dubious - it somehow has MAC addresses for both the source and destination (which is impossible), and almost none of these addresses are in actual issued ranges.

comment count unavailable comments

6 May 2021

Matthew Garrett: More doorbell adventures

Back in my last post on this topic, I'd got shell on my doorbell but hadn't figured out why the HTTP callbacks weren't always firing. I still haven't, but I have learned some more things.

Doorbird sell a chime, a network connected device that is signalled by the doorbell when someone pushes a button. It costs about $150, which seems excessive, but would solve my problem (ie, that if someone pushes the doorbell and I'm not paying attention to my phone, I miss it entirely). But given a shell on the doorbell, how hard could it be to figure out how to mimic the behaviour of one?

Configuration for the doorbell is all stored under /mnt/flash, and there's a bunch of files prefixed 1000eyes that contain config (1000eyes is the German company that seems to be behind Doorbird). One of these was called 1000eyes.peripherals, which seemed like a good starting point. The initial contents were "Peripherals":[] , so it seemed likely that it was intended to be JSON. Unfortunately, since I had no access to any of the peripherals, I had no idea what the format was. I threw the main application into Ghidra and found a function that had debug statements referencing "initPeripherals and read a bunch of JSON keys out of the file, so I could simply look at the keys it referenced and write out a file based on that. I did so, and it didn't work - the app stubbornly refused to believe that there were any defined peripherals. The check that was failing was pcVar4 = strstr(local_50[0],PTR_s_"type":"_0007c980);, which made no sense, since I very definitely had a type key in there. And then I read it more closely. strstr() wasn't being asked to look for "type":, it was being asked to look for "type":". I'd left a space between the : and the opening " in the value, which meant it wasn't matching. The rest of the function seems to call an actual JSON parser, so I have no idea why it doesn't just use that for this part as well, but deleting the space and restarting the service meant it now believed I had a peripheral attached.

The mobile app that's used for configuring the doorbell now showed a device in the peripherals tab, but it had a weird corrupted name. Tapping it resulted in an error telling me that the device was unavailable, and on the doorbell itself generated a log message showing it was trying to reach a device with the hostname bha-04f0212c5cca and (unsurprisingly) failing. The hostname was being generated from the MAC address field in the peripherals file and was presumably supposed to be resolved using mDNS, but for now I just threw a static entry in /etc/hosts pointing at my Home Assistant device. That was enough to show that when I opened the app the doorbell was trying to call a CGI script called peripherals.cgi on my fake chime. When that failed, it called out to the cloud API to ask it to ask the chime[1] instead. Since the cloud was completely unaware of my fake device, this didn't work either. I hacked together a simple server using Python's HTTPServer and was able to return data (another block of JSON). This got me to the point where the app would now let me get to the chime config, but would then immediately exit. adb logcat showed a traceback in the app caused by a failed assertion due to a missing key in the JSON, so I ran the app through jadx, found the assertion and from there figured out what keys I needed. Once that was done, the app opened the config page just fine.

Unfortunately, though, I couldn't edit the config. Whenever I hit "save" the app would tell me that the peripheral wasn't responding. This was strange, since the doorbell wasn't even trying to hit my fake chime. It turned out that the app was making a CGI call to the doorbell, and the thread handling that call was segfaulting just after reading the peripheral config file. This suggested that the format of my JSON was probably wrong and that the doorbell was not handling that gracefully, but trying to figure out what the format should actually be didn't seem easy and none of my attempts improved things.

So, new approach. Rather than writing the config myself, why not let the doorbell do it? I should be able to use the genuine pairing process if I could mimic the chime sufficiently well. Hitting the "add" button in the app asked me for the username and password for the chime, so I typed in something random in the expected format (six characters followed by four zeroes) and a sufficiently long password and hit ok. A few seconds later it told me it couldn't find the device, which wasn't unexpected. What was a little more unexpected was that the log on the doorbell was showing it trying to hit another bha-prefixed hostname (and, obviously, failing). The hostname contains the MAC address, but I hadn't told the doorbell the MAC address of the chime, just its username. Some more digging showed that the doorbell was calling out to the cloud API, giving it the 6 character prefix from the username and getting a MAC address back. Doing the same myself revealed that there was a straightforward mapping from the prefix to the mac address - changing the final character from "a" to "b" incremented the MAC by one. It's actually just a base 26 encoding of the MAC, with aaaaaa translating to 00408C000000.

That explained how the hostname was being generated, and in return I was able to work backwards to figure out which username I should use to generate the hostname I was already using. Attempting to add it now resulted in the doorbell making another CGI call to my fake chime in order to query its feature set, and by mocking that up as well I was able to send back a file containing X-Intercom-Type, X-Intercom-TypeId and X-Intercom-Class fields that made the doorbell happy. I now had a valid JSON file, which cleared up a couple of mysteries. The corrupt name was because the name field isn't supposed to be ASCII - it's base64 encoded UTF16-BE. And the reason I hadn't been able to figure out the JSON format correctly was because it looked something like this:

"Peripherals":[] "prefix": "type":"DoorChime","name":"AEQAbwBvAHIAYwBoAGkAbQBlACAAVABlAHMAdA==","mac":"04f0212c5cca","user":"username","password":"password" ]


Note that there's a total of one [ in this file, but two ]s? Awesome. Anyway, I could now modify the config in the app and hit save, and the doorbell would then call out to my fake chime to push config to it. Weirdly, the association between the chime and a specific button on the doorbell is only stored on the chime, not on the doorbell. Further, hitting the doorbell didn't result in any more HTTP traffic to my fake chime. However, it did result in some broadcast UDP traffic being generated. Searching for the port number led me to the Doorbird LAN API and a complete description of the format and encryption mechanism in use. Argon2I is used to turn the first five characters of the chime's password (which is also stored on the doorbell itself) into a 256-bit key, and this is used with ChaCha20 to decrypt the payload. The payload then contains a six character field describing the device sending the event, and then another field describing the event itself. Some more scrappy Python and I could pick up these packets and decrypt them, which showed that they were being sent whenever any event occurred on the doorbell. This explained why there was no storage of the button/chime association on the doorbell itself - the doorbell sends packets for all events, and the chime is responsible for deciding whether to act on them or not.

On closer examination, it turns out that these packets aren't just sent if there's a configured chime. One is sent for each configured user, avoiding the need for a cloud round trip if your phone is on the same network as the doorbell at the time. There was literally no need for me to mimic the chime at all, suitable events were already being sent.

Still. There's a fair amount of WTFery here, ranging from the strstr() based JSON parsing, the invalid JSON, the symmetric encryption that uses device passwords as the key (requiring the doorbell to be aware of the chime's password) and the use of only the first five characters of the password as input to the KDF. It doesn't give me a great deal of confidence in the rest of the device's security, so I'm going to keep playing.

[1] This seems to be to handle the case where the chime isn't on the same network as the doorbell

comment count unavailable comments

14 April 2021

Dirk Eddelbuettel: RcppArmadillo 0.10.4.0.0 on CRAN: New Upstream Plus

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 852 other packages on CRAN. This new release brings us the just release Armadillo 10.4.0. Upstream moves at a speed that is a little faster than the cadence CRAN likes. We release RcppArmadillo 0.10.2.2.0 on March 9; and upstream 10.3.0 came out shortly thereafter. We aim to accomodate CRAN with (roughly) monthly (or less frequent) releases) so by the time we were ready 10.4.0 had just come out. As it turns, the full testing had a benefit. Among the (currently) 852 CRAN packages using RcppArmadillo, two were failing tests. This is due to a subtle, but important point. Early on we realized that it would be beneficial if the standard R control over random-number creation and seeding affected Armadillo too, which Conrad accomodated kindly with an optional RNG interface which RcppArmadillo supplies. With recent changes he made, the R side saw normally-distributed draws (via the Armadillo interface) changed, which lead to the two changes. All hail unit tests. So I mentioned this to Conrad, and with the usual Chicago-Brisbane time difference late my evening a fix was in my inbox. The CRAN upload was then halted as I had missed that due to other changes he had made random draws from a Gamma would now call std::rand() which CRAN flags. Another email to Brisbane, another late (one-line) fix back and all was good. We still encountered one package with an error but flagged this as internal to that package s setup, so Uwe let RcppArmadillo onto CRAN, I contacted that package s maintainer who was very receptive and a change should be forthcoming. So with all that we have 0.10.4.0.0 on CRAN giving us Armadillo 10.4.0. The full set of changes follows. As Armadillo 10.3.0 was not uploaded to CRAN, its changes are included too.

Changes in RcppArmadillo version 0.10.4.0.0 (2021-04-12)
  • Upgraded to Armadillo release 10.4.0 (Pressure Cooker)
    • faster handling of triangular matrices by log_det()
    • added log_det_sympd() for log determinant of symmetric positive matrices
    • added ARMA_WARN_LEVEL configuration option, to control the degree of emitted warning messages
    • reduced the default degree of warning messages, so that failed decompositions, failed saving/loading, etc, no longer emit warnings
  • Apply one upstream corrections for arma::randn draws when using alternative (here R) generator, and arma::randg.

Changes in RcppArmadillo version 0.10.3.0.0 (2021-03-10)
  • Upgraded to Armadillo release 10.3 (Sunrise Chaos)
    • faster handling of symmetric positive definite matrices by pinv()
    • expanded .save() / .load() for dense matrices to handle coord_ascii format
    • for out of bounds access, element accessors now throw the more nuanced std::out_of_range exception, instead of only std::logic_error
    • improved quality of random numbers

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

9 April 2021

Michael Prokop: A Ceph war story

It all started with the big bang! We nearly lost 33 of 36 disks on a Proxmox/Ceph Cluster; this is the story of how we recovered them. At the end of 2020, we eventually had a long outstanding maintenance window for taking care of system upgrades at a customer. During this maintenance window, which involved reboots of server systems, the involved Ceph cluster unexpectedly went into a critical state. What was planned to be a few hours of checklist work in the early evening turned out to be an emergency case; let s call it a nightmare (not only because it included a big part of the night). Since we have learned a few things from our post mortem and RCA, it s worth sharing those with others. But first things first, let s step back and clarify what we had to deal with. The system and its upgrade One part of the upgrade included 3 Debian servers (we re calling them server1, server2 and server3 here), running on Proxmox v5 + Debian/stretch with 12 Ceph OSDs each (65.45TB in total), a so-called Proxmox Hyper-Converged Ceph Cluster. First, we went for upgrading the Proxmox v5/stretch system to Proxmox v6/buster, before updating Ceph Luminous v12.2.13 to the latest v14.2 release, supported by Proxmox v6/buster. The Proxmox upgrade included updating corosync from v2 to v3. As part of this upgrade, we had to apply some configuration changes, like adjust ring0 + ring1 address settings and add a mon_host configuration to the Ceph configuration. During the first two servers reboots, we noticed configuration glitches. After fixing those, we went for a reboot of the third server as well. Then we noticed that several Ceph OSDs were unexpectedly down. The NTP service wasn t working as expected after the upgrade. The underlying issue is a race condition of ntp with systemd-timesyncd (see #889290). As a result, we had clock skew problems with Ceph, indicating that the Ceph monitors clocks aren t running in sync (which is essential for proper Ceph operation). We initially assumed that our Ceph OSD failure derived from this clock skew problem, so we took care of it. After yet another round of reboots, to ensure the systems are running all with identical and sane configurations and services, we noticed lots of failing OSDs. This time all but three OSDs (19, 21 and 22) were down:
% sudo ceph osd tree
ID CLASS WEIGHT   TYPE NAME      STATUS REWEIGHT PRI-AFF
-1       65.44138 root default
-2       21.81310     host server1
 0   hdd  1.08989         osd.0    down  1.00000 1.00000
 1   hdd  1.08989         osd.1    down  1.00000 1.00000
 2   hdd  1.63539         osd.2    down  1.00000 1.00000
 3   hdd  1.63539         osd.3    down  1.00000 1.00000
 4   hdd  1.63539         osd.4    down  1.00000 1.00000
 5   hdd  1.63539         osd.5    down  1.00000 1.00000
18   hdd  2.18279         osd.18   down  1.00000 1.00000
20   hdd  2.18179         osd.20   down  1.00000 1.00000
28   hdd  2.18179         osd.28   down  1.00000 1.00000
29   hdd  2.18179         osd.29   down  1.00000 1.00000
30   hdd  2.18179         osd.30   down  1.00000 1.00000
31   hdd  2.18179         osd.31   down  1.00000 1.00000
-4       21.81409     host server2
 6   hdd  1.08989         osd.6    down  1.00000 1.00000
 7   hdd  1.08989         osd.7    down  1.00000 1.00000
 8   hdd  1.63539         osd.8    down  1.00000 1.00000
 9   hdd  1.63539         osd.9    down  1.00000 1.00000
10   hdd  1.63539         osd.10   down  1.00000 1.00000
11   hdd  1.63539         osd.11   down  1.00000 1.00000
19   hdd  2.18179         osd.19     up  1.00000 1.00000
21   hdd  2.18279         osd.21     up  1.00000 1.00000
22   hdd  2.18279         osd.22     up  1.00000 1.00000
32   hdd  2.18179         osd.32   down  1.00000 1.00000
33   hdd  2.18179         osd.33   down  1.00000 1.00000
34   hdd  2.18179         osd.34   down  1.00000 1.00000
-3       21.81419     host server3
12   hdd  1.08989         osd.12   down  1.00000 1.00000
13   hdd  1.08989         osd.13   down  1.00000 1.00000
14   hdd  1.63539         osd.14   down  1.00000 1.00000
15   hdd  1.63539         osd.15   down  1.00000 1.00000
16   hdd  1.63539         osd.16   down  1.00000 1.00000
17   hdd  1.63539         osd.17   down  1.00000 1.00000
23   hdd  2.18190         osd.23   down  1.00000 1.00000
24   hdd  2.18279         osd.24   down  1.00000 1.00000
25   hdd  2.18279         osd.25   down  1.00000 1.00000
35   hdd  2.18179         osd.35   down  1.00000 1.00000
36   hdd  2.18179         osd.36   down  1.00000 1.00000
37   hdd  2.18179         osd.37   down  1.00000 1.00000
Our blood pressure increased slightly! Did we just lose all of our cluster? What happened, and how can we get all the other OSDs back? We stumbled upon this beauty in our logs:
kernel: [   73.697957] XFS (sdl1): SB stripe unit sanity check failed
kernel: [   73.698002] XFS (sdl1): Metadata corruption detected at xfs_sb_read_verify+0x10e/0x180 [xfs], xfs_sb block 0xffffffffffffffff
kernel: [   73.698799] XFS (sdl1): Unmount and run xfs_repair
kernel: [   73.699199] XFS (sdl1): First 128 bytes of corrupted metadata buffer:
kernel: [   73.699677] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00  XFSB..........b.
kernel: [   73.700205] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
kernel: [   73.700836] 00000020: 62 44 2b c0 e6 22 40 d7 84 3d e1 cc 65 88 e9 d8  bD+.."@..=..e...
kernel: [   73.701347] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00  ......@.........
kernel: [   73.701770] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02  ................
ceph-disk[4240]: mount: /var/lib/ceph/tmp/mnt.jw367Y: mount(2) system call failed: Structure needs cleaning.
ceph-disk[4240]: ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t', u'xfs', '-o', 'noatime,inode64', '--', '/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.cdda39ed-5
ceph/tmp/mnt.jw367Y']' returned non-zero exit status 32
kernel: [   73.702162] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00  ................
kernel: [   73.702550] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00  ...H............
kernel: [   73.702975] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19  ................
kernel: [   73.703373] XFS (sdl1): SB validate failed with error -117.
The same issue was present for the other failing OSDs. We hoped, that the data itself was still there, and only the mounting of the XFS partitions failed. The Ceph cluster was initially installed in 2017 with Ceph jewel/10.2 with the OSDs on filestore (nowadays being a legacy approach to storing objects in Ceph). However, we migrated the disks to bluestore since then (with ceph-disk and not yet via ceph-volume what s being used nowadays). Using ceph-disk introduces these 100MB XFS partitions containing basic metadata for the OSD. Given that we had three working OSDs left, we decided to investigate how to rebuild the failing ones. Some folks on #ceph (thanks T1, ormandj + peetaur!) were kind enough to share how working XFS partitions looked like for them. After creating a backup (via dd), we tried to re-create such an XFS partition on server1. We noticed that even mounting a freshly created XFS partition failed:
synpromika@server1 ~ % sudo mkfs.xfs -f -i size=2048 -m uuid="4568c300-ad83-4288-963e-badcd99bf54f" /dev/sdc1
meta-data=/dev/sdc1              isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=128    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
synpromika@server1 ~ % sudo mount /dev/sdc1 /mnt/ceph-recovery
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000
cache_node_purge: refcount was 1, not zero (node=0x1d3c400)
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x18800/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x18800/0x1000
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x24c00/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x24c00/0x1000
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0xc400/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0xc400/0x1000
releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!found dirty buffer (bulk) on free list!bad magic number
bad magic number
Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000
releasing dirty buffer (bulk) to free list!mount: /mnt/ceph-recovery: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
Ouch. This very much looked related to the actual issue we re seeing. So we tried to execute mkfs.xfs with a bunch of different sunit/swidth settings. Using -d sunit=512 -d swidth=512 at least worked then, so we decided to force its usage in the creation of our OSD XFS partition. This brought us a working XFS partition. Please note, sunit must not be larger than swidth (more on that later!). Then we reconstructed how to restore all the metadata for the OSD (activate.monmap, active, block_uuid, bluefs, ceph_fsid, fsid, keyring, kv_backend, magic, mkfs_done, ready, require_osd_release, systemd, type, whoami). To identify the UUID, we can read the data from ceph --format json osd dump , like this for all our OSDs (Zsh syntax ftw!):
synpromika@server1 ~ % for f in  0..37  ; printf "osd-$f: %s\n" "$(sudo ceph --format json osd dump   jq -r ".osds[]   select(.osd==$f)   .uuid")"
osd-0: 4568c300-ad83-4288-963e-badcd99bf54f
osd-1: e573a17a-ccde-4719-bdf8-eef66903ca4f
osd-2: 0e1b2626-f248-4e7d-9950-f1a46644754e
osd-3: 1ac6a0a2-20ee-4ed8-9f76-d24e900c800c
[...]
Identifying the corresponding raw device for each OSD UUID is possible via:
synpromika@server1 ~ % UUID="4568c300-ad83-4288-963e-badcd99bf54f"
synpromika@server1 ~ % readlink -f /dev/disk/by-partuuid/"$ UUID "
/dev/sdc1
The OSD s key ID can be retrieved via:
synpromika@server1 ~ % OSD_ID=0
synpromika@server1 ~ % sudo ceph auth get osd."$ OSD_ID " -f json 2>/dev/null   jq -r '.[]   .key'
AQCKFpZdm0We[...]
Now we also need to identify the underlying block device:
synpromika@server1 ~ % OSD_ID=0
synpromika@server1 ~ % sudo ceph osd metadata osd."$ OSD_ID " -f json   jq -r '.bluestore_bdev_partition_path'    
/dev/sdc2
With all of this, we reconstructed the keyring, fsid, whoami, block + block_uuid files. All the other files inside the XFS metadata partition are identical on each OSD. So after placing and adjusting the corresponding metadata on the XFS partition for Ceph usage, we got a working OSD hurray! Since we had to fix yet another 32 OSDs, we decided to automate this XFS partitioning and metadata recovery procedure. We had a network share available on /srv/backup for storing backups of existing partition data. On each server, we tested the procedure with one single OSD before iterating over the list of remaining failing OSDs. We started with a shell script on server1, then adjusted the script for server2 and server3. This is the script, as we executed it on the 3rd server. Thanks to this, we managed to get the Ceph cluster up and running again. We didn t want to continue with the Ceph upgrade itself during the night though, as we wanted to know exactly what was going on and why the system behaved like that. Time for RCA! Root Cause Analysis So all but three OSDs on server2 failed, and the problem seems to be related to XFS. Therefore, our starting point for the RCA was, to identify what was different on server2, as compared to server1 + server3. My initial assumption was that this was related to some firmware issues with the involved controller (and as it turned out later, I was right!). The disks were attached as JBOD devices to a ServeRAID M5210 controller (with a stripe size of 512). Firmware state:
synpromika@server1 ~ % sudo storcli64 /c0 show all   grep '^Firmware'
Firmware Package Build = 24.16.0-0092
Firmware Version = 4.660.00-8156
synpromika@server2 ~ % sudo storcli64 /c0 show all   grep '^Firmware'
Firmware Package Build = 24.21.0-0112
Firmware Version = 4.680.00-8489
synpromika@server3 ~ % sudo storcli64 /c0 show all   grep '^Firmware'
Firmware Package Build = 24.16.0-0092
Firmware Version = 4.660.00-8156
This looked very promising, as server2 indeed runs with a different firmware version on the controller. But how so? Well, the motherboard of server2 got replaced by a Lenovo/IBM technician in January 2020, as we had a failing memory slot during a memory upgrade. As part of this procedure, the Lenovo/IBM technician installed the latest firmware versions. According to our documentation, some OSDs were rebuilt (due to the filestore->bluestore migration) in March and April 2020. It turned out that precisely those OSDs were the ones that survived the upgrade. So the surviving drives were created with a different firmware version running on the involved controller. All the other OSDs were created with an older controller firmware. But what difference does this make? Now let s check firmware changelogs. For the 24.21.0-0097 release we found this:
- Cannot create or mount xfs filesystem using xfsprogs 4.19.x kernel 4.20(SCGCQ02027889)
- xfs_info command run on an XFS file system created on a VD of strip size 1M shows sunit and swidth as 0(SCGCQ02056038)
Our XFS problem certainly was related to the controller s firmware. We also recalled that our monitoring system reported different sunit settings for the OSDs that were rebuilt in March and April. For example, OSD 21 was recreated and got different sunit settings:
WARN  server2.example.org  Mount options of /var/lib/ceph/osd/ceph-21      WARN - Missing: sunit=1024, Exceeding: sunit=512
We compared the new OSD 21 with an existing one (OSD 25 on server3):
synpromika@server2 ~ % systemctl show var-lib-ceph-osd-ceph\\x2d21.mount   grep sunit
Options=rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota
synpromika@server3 ~ % systemctl show var-lib-ceph-osd-ceph\\x2d25.mount   grep sunit
Options=rw,noatime,attr2,inode64,sunit=1024,swidth=512,noquota
Thanks to our documentation, we could compare execution logs of their creation:
% diff -u ceph-disk-osd-25.log ceph-disk-osd-21.log
-synpromika@server2 ~ % sudo ceph-disk -v prepare --bluestore /dev/sdj --osd-id 25
+synpromika@server3 ~ % sudo ceph-disk -v prepare --bluestore /dev/sdi --osd-id 21
[...]
-command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdj1
-meta-data=/dev/sdj1              isize=2048   agcount=4, agsize=6272 blks
[...]
+command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdi1
+meta-data=/dev/sdi1              isize=2048   agcount=4, agsize=6336 blks
          =                       sectsz=4096  attr=2, projid32bit=1
          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
-data     =                       bsize=4096   blocks=25088, imaxpct=25
-         =                       sunit=128    swidth=64 blks
+data     =                       bsize=4096   blocks=25344, imaxpct=25
+         =                       sunit=64     swidth=64 blks
 naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
 log      =internal log           bsize=4096   blocks=1608, version=2
          =                       sectsz=4096  sunit=1 blks, lazy-count=1
 realtime =none                   extsz=4096   blocks=0, rtextents=0
[...]
So back then, we even tried to track this down but couldn t make sense of it yet. But now this sounds very much like it is related to the problem we saw with this Ceph/XFS failure. We follow Occam s razor, assuming the simplest explanation is usually the right one, so let s check the disk properties and see what differs:
synpromika@server1 ~ % sudo blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sdk
4685545472
2398999281664
512
4096
524288
262144
synpromika@server2 ~ % sudo blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sdk
4685545472
2398999281664
512
4096
262144
262144
See the difference between server1 and server2 for identical disks? The getiomin option now reports something different for them:
synpromika@server1 ~ % sudo blockdev --getiomin /dev/sdk            
524288
synpromika@server1 ~ % cat /sys/block/sdk/queue/minimum_io_size
524288
synpromika@server2 ~ % sudo blockdev --getiomin /dev/sdk 
262144
synpromika@server2 ~ % cat /sys/block/sdk/queue/minimum_io_size
262144
It doesn t make sense that the minimum I/O size (iomin, AKA BLKIOMIN) is bigger than the optimal I/O size (ioopt, AKA BLKIOOPT). This leads us to Bug 202127 cannot mount or create xfs on a 597T device, which matches our findings here. But why did this XFS partition work in the past and fails now with the newer kernel version? The XFS behaviour change Now given that we have backups of all the XFS partition, we wanted to track down, a) when this XFS behaviour was introduced, and b) whether, and if so how it would be possible to reuse the XFS partition without having to rebuild it from scratch (e.g. if you would have no working Ceph OSD or backups left). Let s look at such a failing XFS partition with the Grml live system:
root@grml ~ # grml-version
grml64-full 2020.06 Release Codename Ausgehfuahangl [2020-06-24]
root@grml ~ # uname -a
Linux grml 5.6.0-2-amd64 #1 SMP Debian 5.6.14-2 (2020-06-09) x86_64 GNU/Linux
root@grml ~ # grml-hostname grml-2020-06
Setting hostname to grml-2020-06: done
root@grml ~ # exec zsh
root@grml-2020-06 ~ # dpkg -l xfsprogs util-linux
Desired=Unknown/Install/Remove/Purge/Hold
  Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
 / Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
 / Name           Version      Architecture Description
+++-==============-============-============-=========================================
ii  util-linux     2.35.2-4     amd64        miscellaneous system utilities
ii  xfsprogs       5.6.0-1+b2   amd64        Utilities for managing the XFS filesystem
There it s failing, no matter which mount option we try:
root@grml-2020-06 ~ # mount ./sdd1.dd /mnt
mount: /mnt: mount(2) system call failed: Structure needs cleaning.
root@grml-2020-06 ~ # dmesg   tail -30
[...]
[   64.788640] XFS (loop1): SB stripe unit sanity check failed
[   64.788671] XFS (loop1): Metadata corruption detected at xfs_sb_read_verify+0x102/0x170 [xfs], xfs_sb block 0xffffffffffffffff
[   64.788671] XFS (loop1): Unmount and run xfs_repair
[   64.788672] XFS (loop1): First 128 bytes of corrupted metadata buffer:
[   64.788673] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00  XFSB..........b.
[   64.788674] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   64.788675] 00000020: 32 b6 dc 35 53 b7 44 96 9d 63 30 ab b3 2b 68 36  2..5S.D..c0..+h6
[   64.788675] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00  ......@.........
[   64.788675] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02  ................
[   64.788676] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00  ................
[   64.788677] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00  ...H............
[   64.788677] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19  ................
[   64.788679] XFS (loop1): SB validate failed with error -117.
root@grml-2020-06 ~ # mount -t xfs -o rw,relatime,attr2,inode64,sunit=1024,swidth=512,noquota ./sdd1.dd /mnt/
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error.
32 root@grml-2020-06 ~ # dmesg   tail -1
[   66.342976] XFS (loop1): stripe width (512) must be a multiple of the stripe unit (1024)
root@grml-2020-06 ~ # mount -t xfs -o rw,relatime,attr2,inode64,sunit=512,swidth=512,noquota ./sdd1.dd /mnt/
mount: /mnt: mount(2) system call failed: Structure needs cleaning.
32 root@grml-2020-06 ~ # dmesg   tail -14
[   66.342976] XFS (loop1): stripe width (512) must be a multiple of the stripe unit (1024)
[   80.751277] XFS (loop1): SB stripe unit sanity check failed
[   80.751323] XFS (loop1): Metadata corruption detected at xfs_sb_read_verify+0x102/0x170 [xfs], xfs_sb block 0xffffffffffffffff 
[   80.751324] XFS (loop1): Unmount and run xfs_repair
[   80.751325] XFS (loop1): First 128 bytes of corrupted metadata buffer:
[   80.751327] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00  XFSB..........b.
[   80.751328] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   80.751330] 00000020: 32 b6 dc 35 53 b7 44 96 9d 63 30 ab b3 2b 68 36  2..5S.D..c0..+h6
[   80.751331] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00  ......@.........
[   80.751331] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02  ................
[   80.751332] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00  ................
[   80.751333] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00  ...H............
[   80.751334] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19  ................
[   80.751338] XFS (loop1): SB validate failed with error -117.
Also xfs_repair doesn t help either:
root@grml-2020-06 ~ # xfs_info ./sdd1.dd
meta-data=./sdd1.dd              isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=128    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@grml-2020-06 ~ # xfs_repair ./sdd1.dd
Phase 1 - find and verify superblock...
bad primary superblock - bad stripe width in superblock !!!
attempting to find secondary superblock...
..............................................................................................Sorry, could not find valid secondary superblock
Exiting now.
With the SB stripe unit sanity check failed message, we could easily track this down to the following commit fa4ca9c:
% git show fa4ca9c5574605d1e48b7e617705230a0640b6da   cat
commit fa4ca9c5574605d1e48b7e617705230a0640b6da
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Jun 5 10:06:16 2018 -0700
    
    xfs: catch bad stripe alignment configurations
    
    When stripe alignments are invalid, data alignment algorithms in the
    allocator may not work correctly. Ensure we catch superblocks with
    invalid stripe alignment setups at mount time. These data alignment
    mismatches are now detected at mount time like this:
    
    XFS (loop0): SB stripe unit sanity check failed
    XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff
    XFS (loop0): Unmount and run xfs_repair
    XFS (loop0): First 128 bytes of corrupted metadata buffer:
    0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00  XFSB............
    0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2  .27...F=...3....
    000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0  ................
    0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2  ................
    00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00  ................
    000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00  ... .4..........
    00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19  ................
    XFS (loop0): SB validate failed with error -117.
    
    And the mount fails.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
diff --git fs/xfs/libxfs/xfs_sb.c fs/xfs/libxfs/xfs_sb.c
index b5dca3c8c84d..c06b6fc92966 100644
--- fs/xfs/libxfs/xfs_sb.c
+++ fs/xfs/libxfs/xfs_sb.c
@@ -278,6 +278,22 @@ xfs_mount_validate_sb(
                return -EFSCORRUPTED;
         
        
+       if (sbp->sb_unit)  
+               if (!xfs_sb_version_hasdalign(sbp)  
+                   sbp->sb_unit > sbp->sb_width  
+                   (sbp->sb_width % sbp->sb_unit) != 0)  
+                       xfs_notice(mp, "SB stripe unit sanity check failed");
+                       return -EFSCORRUPTED;
+                 
+         else if (xfs_sb_version_hasdalign(sbp))   
+               xfs_notice(mp, "SB stripe alignment sanity check failed");
+               return -EFSCORRUPTED;
+         else if (sbp->sb_width)  
+               xfs_notice(mp, "SB stripe width sanity check failed");
+               return -EFSCORRUPTED;
+        
+
+       
        if (xfs_sb_version_hascrc(&mp->m_sb) &&
            sbp->sb_blocksize < XFS_MIN_CRC_BLOCKSIZE)  
                xfs_notice(mp, "v5 SB sanity check failed");
This change is included in kernel versions 4.18-rc1 and newer:
% git describe --contains fa4ca9c5574605d1e48
v4.18-rc1~37^2~14
Now let s try with an older kernel version (4.9.0), using old Grml 2017.05 release:
root@grml ~ # grml-version
grml64-small 2017.05 Release Codename Freedatensuppe [2017-05-31]
root@grml ~ # uname -a
Linux grml 4.9.0-1-grml-amd64 #1 SMP Debian 4.9.29-1+grml.1 (2017-05-24) x86_64 GNU/Linux
root@grml ~ # lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 9.0 (stretch)
Release:        9.0
Codename:       stretch
root@grml ~ # grml-hostname grml-2017-05
Setting hostname to grml-2017-05: done
root@grml ~ # exec zsh
root@grml-2017-05 ~ #
root@grml-2017-05 ~ # xfs_info ./sdd1.dd
xfs_info: ./sdd1.dd is not a mounted XFS filesystem
1 root@grml-2017-05 ~ # xfs_repair ./sdd1.dd
Phase 1 - find and verify superblock...
bad primary superblock - bad stripe width in superblock !!!
attempting to find secondary superblock...
..............................................................................................Sorry, could not find valid secondary superblock
Exiting now.
1 root@grml-2017-05 ~ # mount ./sdd1.dd /mnt
root@grml-2017-05 ~ # mount -t xfs
/root/sdd1.dd on /mnt type xfs (rw,relatime,attr2,inode64,sunit=1024,swidth=512,noquota)
root@grml-2017-05 ~ # ls /mnt
activate.monmap  active  block  block_uuid  bluefs  ceph_fsid  fsid  keyring  kv_backend  magic  mkfs_done  ready  require_osd_release  systemd  type  whoami
root@grml-2017-05 ~ # xfs_info /mnt
meta-data=/dev/loop1             isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=128    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Mounting there indeed works! Now, if we mount the filesystem with new and proper sunit/swidth settings using the older kernel, it should rewrite them on disk:
root@grml-2017-05 ~ # mount -t xfs -o sunit=512,swidth=512 ./sdd1.dd /mnt/
root@grml-2017-05 ~ # umount /mnt/
And indeed, mounting this rewritten filesystem then also works with newer kernels:
root@grml-2020-06 ~ # mount ./sdd1.rewritten /mnt/
root@grml-2020-06 ~ # xfs_info /root/sdd1.rewritten
meta-data=/dev/loop1             isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=64    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@grml-2020-06 ~ # mount -t xfs                
/root/sdd1.rewritten on /mnt type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota)
FTR: The sunit=512,swidth=512 from the xfs mount option is identical to xfs_info s output sunit=64,swidth=64 (because mount.xfs s sunit value is given in 512-byte block units, see man 5 xfs, and the xfs_info output reported here is in blocks with a block size (bsize) of 4096, so sunit = 512*512 := 64*4096 ). mkfs uses minimum and optimal sizes for stripe unit and stripe width; you can check this e.g. via (note that server2 with fixed firmware version reports proper values, whereas server3 with broken controller firmware reports non-sense):
synpromika@server2 ~ % for i in /sys/block/sd*/queue/ ; do printf "%s: %s %s\n" "$i" "$(cat "$i"/minimum_io_size)" "$(cat "$i"/optimal_io_size)" ; done
[...]
/sys/block/sdc/queue/: 262144 262144
/sys/block/sdd/queue/: 262144 262144
/sys/block/sde/queue/: 262144 262144
/sys/block/sdf/queue/: 262144 262144
/sys/block/sdg/queue/: 262144 262144
/sys/block/sdh/queue/: 262144 262144
/sys/block/sdi/queue/: 262144 262144
/sys/block/sdj/queue/: 262144 262144
/sys/block/sdk/queue/: 262144 262144
/sys/block/sdl/queue/: 262144 262144
/sys/block/sdm/queue/: 262144 262144
/sys/block/sdn/queue/: 262144 262144
[...]
synpromika@server3 ~ % for i in /sys/block/sd*/queue/ ; do printf "%s: %s %s\n" "$i" "$(cat "$i"/minimum_io_size)" "$(cat "$i"/optimal_io_size)" ; done
[...]
/sys/block/sdc/queue/: 524288 262144
/sys/block/sdd/queue/: 524288 262144
/sys/block/sde/queue/: 524288 262144
/sys/block/sdf/queue/: 524288 262144
/sys/block/sdg/queue/: 524288 262144
/sys/block/sdh/queue/: 524288 262144
/sys/block/sdi/queue/: 524288 262144
/sys/block/sdj/queue/: 524288 262144
/sys/block/sdk/queue/: 524288 262144
/sys/block/sdl/queue/: 524288 262144
/sys/block/sdm/queue/: 524288 262144
/sys/block/sdn/queue/: 524288 262144
[...]
This is the underlying reason why the initially created XFS partitions were created with incorrect sunit/swidth settings. The broken firmware of server1 and server3 was the cause of the incorrect settings they were ignored by old(er) xfs/kernel versions, but treated as an error by new ones. Make sure to also read the XFS FAQ regarding How to calculate the correct sunit,swidth values for optimal performance . We also stumbled upon two interesting reads in RedHat s knowledge base: 5075561 + 2150101 (requires an active subscription, though) and #1835947. Am I affected? How to work around it? To check whether your XFS mount points are affected by this issue, the following command line should be useful:
awk '$3 == "xfs" print $2 ' /proc/self/mounts   while read mount ; do echo -n "$mount " ; xfs_info $mount   awk '$0 ~ "swidth" gsub(/.*=/,"",$2); gsub(/.*=/,"",$3); print $2,$3 '   awk '  if ($1 > $2) print "impacted"; else print "OK" ' ; done
If you run into the above situation, the only known solution to get your original XFS partition working again, is to boot into an older kernel version again (4.17 or older), mount the XFS partition with correct sunit/swidth settings and then boot back into your new system (kernel version wise). Lessons learned Thanks: Darshaka Pathirana, Chris Hofstaedtler and Michael Hanscho. Looking for help with your IT infrastructure? Let us know!

24 February 2021

Rog rio Brito: Alternatives to ikiwiki?

It's been quite a long time since I last posted anything on this blog and I can say that one of the reasons for that I don't feel comfortable using ikiwiki anymore. I am actively looking for alternatives to ikiwiki that allow me, mainly, to write blog posts with the following characteristics (not necessarily in order of importance): Connected to the fact that I only can have static sites (no CGI, no forms, nothing else), I am, at this time, using Disqus to host the comments of my blog. I am also thinking of alternatives to this, like sending people to Twitter (or mastodon or email) or some site similar to Disqus, but with more of a Free Software inclination. Anyway, I am almost ready for any kind of transition, since I have already converted most posts (of course, not yet this one ) with some Python scripts to a format that I feel is a bit more format-agnostic than what ikiwiki uses.

  1. That's not to mention the myriad of hugo themes and theme authors that try to bribe you into using their hosted solutions (despite branding everything as "open source! OMG!"), like "wowchemy" you will have a hard time untangling the instructions of their so bloated themes to be usable on your local computer; so much so to the point that you give up with their convoluted configuration (which, potentially, doesn't "transfer" to other themes, if you are worried about possibly changing themes in the future).
  2. I like the style of fenced blocks that GitHub used, where you prefix the code with the name of the language to give a hint of how to highlight the code snipped.

16 February 2021

Michael Prokop: How to properly use 3rd party Debian repository signing keys with apt

(Blogging this, since this is a recurring anti-pattern I noticed at several customers and often comes up during deployments of 3rd party repositories.) Update on 2021-02-19: clarified, that Signed-By requires apt >= 1.1, thanks Vincent Bernat Many upstream projects provide Debian repository instructions like this:
curl -fsSL https://example.com/stable/debian.gpg   sudo apt-key add -
Do not follow this, for different reasons, including:
  1. You do not see what you get before adding the GPG key to your global apt trust store
  2. You can t easily script this via your preferred configuration management (the apt-key manpage clearly discourages programmatic usage)
  3. The signing key is considered valid for all your enabled Debian repositories (instead of only a specific one)
  4. You need GnuPG (either gnupg2 or gnupg1) on your system for usage with apt-key
There s a much better approach to this: download the GPG key, make sure it s in the appropriate format, then use it via deb [signed-by=/usr/share/keyrings/ ] in your apt s sources list configuration. Note and FTR: the Signed-By feature is available starting with apt 1.1 (so apt in Debian jessie/8 and older does not support it). TL;DR: As an example, let s demonstrate this with the Tailscale Debian repository for buster.
Downloading the GPG file will give you an ascii-armored GPG file:
% curl -fsSL -o buster.gpg https://pkgs.tailscale.com/stable/debian/buster.gpg
% gpg --keyid-format long buster.gpg 
gpg: WARNING: no command supplied.  Trying to guess what you mean ...
pub   rsa4096/458CA832957F5868 2020-02-25 [SC]
      2596A99EAAB33821893C0A79458CA832957F5868
uid                           Tailscale Inc. (Package repository signing key) <info@tailscale.com>
sub   rsa4096/B1547A3DDAAF03C6 2020-02-25 [E]
% file buster.gpg
buster.gpg: PGP public key block Public-Key (old)
If you have apt version >= 1.4 available (Debian >=stretch/9 and Ubuntu >=bionic/18.04), you can use this file directly as follows:
% sudo mv buster.gpg /usr/share/keyrings/tailscale.asc
% cat /etc/apt/sources.list.d/tailscale.list
deb [signed-by=/usr/share/keyrings/tailscale.asc] https://pkgs.tailscale.com/stable/debian buster main
% sudo apt update
[...]
And you re done! Iff your apt version really is older than 1.4, you need to convert the ascii-armored GPG file into a GPG key public ring file (AKA binary OpenPGP format), either by just dearmor-ing it (if you don t care about checking ID + fingerprint):
% gpg --dearmor < buster.gpg > tailscale.gpg
or if you prefer to go via GPG, you can also use a temporary GPG home directory (if you don t care about going through your personal GPG setup):
% mkdir --mode=700 /tmp/gpg-tmpdir
% gpg --homedir /tmp/gpg-tmpdir --import ./buster.gpg
gpg: keybox '/tmp/gpg-tmpdir/pubring.kbx' created
gpg: /tmp/gpg-tmpdir/trustdb.gpg: trustdb created
gpg: key 458CA832957F5868: public key "Tailscale Inc. (Package repository signing key) <info@tailscale.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1
% gpg --homedir /tmp/gpg-tmpdir --output tailscale.gpg  --export-options=export-minimal --export 0x458CA832957F5868
% rm -rf /tmp/gpg-tmpdir
The resulting GPG key public ring file should look like that:
% file tailscale.gpg 
tailscale.gpg: PGP/GPG key public ring (v4) created Tue Feb 25 04:51:20 2020 RSA (Encrypt or Sign) 4096 bits MPI=0xc00399b10bc12858...
% gpg tailscale.gpg 
gpg: WARNING: no command supplied.  Trying to guess what you mean ...
pub   rsa4096/458CA832957F5868 2020-02-25 [SC]
      2596A99EAAB33821893C0A79458CA832957F5868
uid                           Tailscale Inc. (Package repository signing key) <info@tailscale.com>
sub   rsa4096/B1547A3DDAAF03C6 2020-02-25 [E]
Then you can use this GPG file on your system as follows:
% sudo mv tailscale.gpg /usr/share/keyrings/tailscale.gpg
% cat /etc/apt/sources.list.d/tailscale.list
deb [signed-by=/usr/share/keyrings/tailscale.gpg] https://pkgs.tailscale.com/stable/debian buster main
% sudo apt update
[...]
Such a setup ensures:
  1. You can verify the GPG key file (ID + fingerprint)
  2. You can easily ship files via /usr/share/keyrings/ and refer to it in your deployment scripts, configuration management, (and can also easily update or get rid of them again!)
  3. The GPG key is valid only for the repositories with the corresponding [signed-by=/usr/share/keyrings/ ] entry
  4. You don t need to install GnuPG (neither gnupg2 nor gnupg1) on the system which is using the 3rd party Debian repository
Thanks: Guillem Jover for reviewing an early draft of this blog article.

31 January 2021

Paul Wise: FLOSS Activities January 2021

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration
  • Debian ports: fix header on incoming.ports.d.o/buildd
  • Debian wiki: unblock IP addresses, approve accounts

Communication
  • Initiate discussions about aptly, Linux USB gadgets maintenance
  • Respond to queries from Debian users and contributors on the mailing lists and IRC
  • Invite organisations to post on FOSSjobs

Sponsors The samba work, apt-listchanges work, pytest-rerunfailures upload, python-testfixtures/python-scrapy bugs and python-scrapy related backports were sponsored by my employer. All other work was done on a volunteer basis.

30 January 2021

Emmanuel Kasper: Playing Tetris over serial console

Today I played Tetris over a serial console connection, on a Vax 4000 running OpenBSD. I haven't felt that 1337 since a long time.
I am going to get rid of that Vax system though. If that's your stuff, contact me privately.

asciinema in its greatness:

17 January 2021

Enrico Zini: OpenStreetMap maps links

Some interesting renderings of OpenStreetMap data: If you want to print out local maps, MyOSMatic is a service to generate maps of cities using OpenStreetMap data. The generated maps are available in PNG, PDF and SVG formats and are ready to be printed. On a terminal? Sure: MapSCII is a Braille & ASCII world map renderer for the console: telnet mapscii.me and explore the map with the mouse, keyboard arrows, and a and z to zoom, c to switch to block character mode, q to quit. Alternatively you can fly over it, and you might have to dodge the rare map editing bug, or have fun landing on it: A typo created a 212-story monolith in Microsoft Flight Simulator

1 January 2021

Dmitry Shachnev: A review of endianness bugs in Qt, and how they were fixed

As you may know, I am Qt 5 maintainer in Debian. Maintaning Qt means not only bumping the version each time a new version is released, but also making sure Qt builds successfully on all architectures that are supported in Debian (and for some submodules, the automatic tests pass). An important sort of build failures are endianness specific failures. Most widely used architectures (x86_64, aarch64) are little endian. However, Debian officially supports one big endian architecture (s390x), and unofficially a few more ports are provided, such as ppc64 and sparc64. Unfortunately, Qt upstream does not have any big endian machine in their CI system, so endianness issues get noticed only when the packages fail to build on our build daemons. In the last years I have discovered and fixed some such issues in various parts of Qt, so I decided to write a post to illustrate how to write really cross-platform C/C++ code. Issue 1: the WebP image format handler (code review) The relevant code snippet is:
if (srcImage.format() != QImage::Format_ARGB32)
    srcImage = srcImage.convertToFormat(QImage::Format_ARGB32);
// ...
if (!WebPPictureImportBGRA(&picture, srcImage.bits(), srcImage.bytesPerLine()))  
    // ...
 
The code here is serializing the images into QImage::Format_ARGB32 format, and then passing the bytes into WebP s import function. With this format, the image is stored using a 32-bit ARGB format (0xAARRGGBB). This means that the bytes will be 0xBB, 0xGG, 0xRR, 0xAA or little endian and 0xAA, 0xRR, 0xGG, 0xBB on big endian. However, WebPPictureImportBGRA expects the first format on all architectures. The fix was to use QImage::Format_RGBA8888. As the QImage documentation says, with this format the order of the colors is the same on any architecture if read as bytes 0xRR, 0xGG, 0xBB, 0xAA. Issue 2: qimage_converter_map structure (code review) The code seems to already support big endian. But maybe you can spot the error?
#if Q_BYTE_ORDER == Q_LITTLE_ENDIAN
        0,
        convert_ARGB_to_ARGB_PM,
#else
        0,
        0
#endif
It is the missing comma! It is present in the little endian block, but not in the big endian one. This was fixed trivially. Issue 3: QHandle, part of Qt 3D module (code review) QHandle class uses a union that is declared as follows:
struct Data  
    quint32 m_index : IndexBits;
    quint32 m_counter : CounterBits;
    quint32 m_unused : 2;
 ;
union  
    Data d;
    quint32 m_handle;
 ;
The sizes are declared such as IndexBits + CounterBits + 2 is always equal to 32 (four bytes). Then we have a constructor that sets the values of Data struct:
QHandle(quint32 i, quint32 count)
 
    d.m_index = i;
    d.m_counter = count;
    d.m_unused = 0;
 
The value of m_handle will be different depending on endianness! So the test that was expecting a particular value with given constructor arguments was failing. I fixed it by using the following macro:
#if Q_BYTE_ORDER == Q_BIG_ENDIAN
#define GET_EXPECTED_HANDLE(qHandle) ((qHandle.index() << (qHandle.CounterBits + 2)) + (qHandle.counter() << 2))
#else /* Q_LITTLE_ENDIAN */
#define GET_EXPECTED_HANDLE(qHandle) (qHandle.index() + (qHandle.counter() << qHandle.IndexBits))
#endif
Issue 4: QML compiler (code review) The QML compiler used a helper class named LEUInt32 (based on QLEInteger) that always stored the numbers in little endian internally. This class can be safely mixed with native quint32 on little endian systems, but not on big endian. Usually the compiler would warn about type mismatch, but here the code used reinterpret_cast, such as:
quint32 *objectTable = reinterpret_cast<quint32*>(data + qmlUnit->offsetToObjects);
So this was not noticed on build time, but the compiler was crashing. The fix was trivial again, replacing quint32 with QLEUInt32. Issue 5: QModbusPdu, part of Qt Serial Bus module (code review) The code snippet is simple:
QModbusPdu::FunctionCode code = QModbusPdu::Invalid;
if (stream.readRawData((char *) (&code), sizeof(quint8)) != sizeof(quint8))
    return stream;
QModbusPdu::FunctionCode is an enum, so code is a multi-byte value (even if only one byte is significant). However, (char *) (&code) returns a pointer to the first byte of it. It is the needed byte on little endian systems, but it is the wrong byte on big endian ones! The correct fix was using a temporary one-byte variable:
quint8 codeByte = 0;
if (stream.readRawData((char *) (&codeByte), sizeof(quint8)) != sizeof(quint8))
    return stream;
QModbusPdu::FunctionCode code = (QModbusPdu::FunctionCode) codeByte;
Issue 6: qt_is_ascii (code review) This function, as the name says, checks whether a string is ASCII. It does that by splitting the string into 4-byte chunks:
while (ptr + 4 <= end)  
    quint32 data = qFromUnaligned<quint32>(ptr);
    if (data &= 0x80808080U)  
        uint idx = qCountTrailingZeroBits(data);
        ptr += idx / 8;
        return false;
     
    ptr += 4;
 
idx / 8 is the number of trailing zero bytes. However, the bytes which are trailing on little endian are actually leading on big endian! So we can use qCountLeadingZeroBits there. Issue 7: the bundled copy of tinycbor (upstream pull request) Similar to issue 5, the code was reading into the wrong byte:
if (bytesNeeded <= 2)  
    read_bytes_unchecked(it, &it->extra, 1, bytesNeeded);
    if (bytesNeeded == 2)
        it->extra = cbor_ntohs(it->extra);
 
extra has type uint16_t, so it has two bytes. When we need only one byte, we read into the wrong byte, so the resulting number is 256 times higher on big endian than it should be. Adding a temporary one-byte variable fixed it. Issue 8: perfparser, part of Qt Creator (code review) Here it is not trivial to find the issue just looking at the code:
qint32 dataStreamVersion = qToLittleEndian(QDataStream::Qt_DefaultCompiledVersion);
However the linker was producing an error:
undefined reference to QDataStream::Version qbswap(QDataStream::Version)'
On little endian systems, qToLittleEndian is a no-op, but on big endian systems, it is a template function defined for some known types. But it turns out we need to explicitly convert enum values to a simple type, so the fix was passing qint32(QDataStream::Qt_DefaultCompiledVersion) to that function. Issue 9: Qt Personal Information Management (code review) The code in test was trying to represent a number as a sequence of bytes, using reinterpret_cast:
static inline QContactId makeId(const QString &managerName, uint id)
 
    return QContactId(QStringLiteral("qtcontacts:basic%1:").arg(managerName), QByteArray(reinterpret_cast<const char *>(&id), sizeof(uint)));
 
The order of bytes will be different on little endian and big endian systems! The fix was adding this line to the beginning of the function:
id = qToLittleEndian(id);
This will cause the bytes to be reversed on big endian systems. What remains unfixed There are still some bugs, which require deeper investigation, for example: P.S. We are looking for new people to help with maintaining Qt 6. Join our team if you want to do some fun work like described above!

1 November 2020

Paul Wise: FLOSS Activities October 2020

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review
  • Spam: reported 2 Debian bug reports and 147 Debian mailing list posts
  • Patches: merged libicns patches
  • Debian packages: sponsored iotop-c
  • Debian wiki: RecentChanges for the month
  • Debian screenshots:

Administration
  • Debian: get us removed from an RBL
  • Debian wiki: reset email addresses, approve accounts

Communication

Sponsors The pytest-rerunfailures/pyemd/morfessor work was sponsored by my employer. All other work was done on a volunteer basis.

19 October 2020

Antoine Beaupr : SSH 2FA with Google Authenticator and Yubikey

About a lifetime ago (5 years), I wrote a tutorial on how to configure my Yubikey for OpenPGP signing, SSH authentication and SSH 2FA. In there, I used the libpam-oath PAM plugin for authentication, but it turns out that had too many problems: users couldn't edit their own 2FA tokens and I had to patch it to avoid forcing 2FA on all users. The latter was merged in the Debian package, but never upstream, and the former was never fixed at all. So I started looking at alternatives and found the Google Authenticator libpam plugin. A priori, it's designed to work with phones and the Google Authenticator app, but there's no reason why it shouldn't work with hardware tokens like the Yubikey. Both use the standard HOTP protocol so it should "just work". After some fiddling, it turns out I was right and you can authenticate with a Yubikey over SSH. Here's that procedure so you don't have to second-guess it yourself.

Installation On Debian, the PAM module is shipped in the google-authenticator source package:
apt install libpam-google-authenticator
Then you need to add the module in your PAM stack somewhere. Since I only use it for SSH, I added this line on top of /etc/pam.d/sshd:
auth required pam_google_authenticator.so nullok
I also used no_increment_hotp debug while debugging to avoid having to renew the token all the time and have more information about failures in the logs. Then reload ssh (not sure that's actually necessary):
service ssh reload

Creating or replacing tokens To create a new key, run this command on the server:
google-authenticator -c
This will prompt you for a bunch of questions. To get them all right, I prefer to just call the right ones on the commandline directly:
google-authenticator --counter-based --qr-mode=NONE --rate-limit=1 --rate-time=30 --emergency-codes=1 --window-size=3
Those are actually the defaults, if my memory serves me right, except for the --qr-mode and --emergency-codes (which can't be disabled so I only print one). I disable the QR code display because I won't be using the codes on my phone, but you would obviously keep it if you want to use the app.

Converting to a Yubikey-compatible secret Unfortunately, the encoding (base32) produced by the google-authenticator command is not compatible with the token expected by the ykpersonalize command used to configure the Yubikey (base16 AKA "hexadecimal", with a fixed 20 bytes length). So you need a way to convert between the two. I wrote a program called oath-convert which basically does this:
read base32
add padding
convert to hex
print
Or, in Python:
def convert_b32_b16(data_b32):
    remainder = len(data_b32) % 8
    if remainder > 0:
        # XXX: assume 6 chars are missing, the actual padding may vary:
        # https://tools.ietf.org/html/rfc3548#section-5
        data_b32 += "======"
    data_b16 = base64.b32decode(data_b32)
    if len(data_b16) < 20:
        # pad to 20 bytes
        data_b16 += b"\x00" * (20 - len(data_b16))
    return binascii.hexlify(data_b16).decode("ascii")
Note that the code assumes a certain token length and will not work correctly for other sizes. To use the program, simply call it with:
head -1 .google_authenticator   oath-convert
Then you paste the output in the prompt:
$ ykpersonalize -1 -o oath-hotp -o append-cr -a
Firmware version 3.4.3 Touch level 1541 Program sequence 2
 HMAC key, 20 bytes (40 characters hex) : [SECRET GOES HERE]
Configuration data to be written to key configuration 1:
fixed: m:
uid: n/a
key: h:[SECRET REDACTED]
acc_code: h:000000000000
OATH IMF: h:0
ticket_flags: APPEND_CR OATH_HOTP
config_flags: 
extended_flags: 
Commit? (y/n) [n]: y
Note that you must NOT pass the -o oath-hotp8 parameter to the ykpersonalize commandline, which we used to do in the Yubikey howto. That is because Google Authenticator tokens are shorter: it's less secure, but it's an acceptable tradeoff considering the plugin is actually maintained. There's actually a feature request to support 8-digit codes so that limitation might eventually be fixed as well. Thanks to the Google Authenticator people and Yubikey people for their support in establishing this procedure.

15 October 2020

Dirk Eddelbuettel: dang 0.0.12: Two new functions

A new release of the dang package is now on CRAN, roughly one year after the last release. The dang package regroups a few functions of mine that had no other home as for example lsos() from a StackOverflow question from 2009 (!!) is one, this overbought/oversold price band plotter from an older blog post is another. More recently added were helpers for data.table to xts conversion and a git repo root finder. This release adds two functions. One was mentioned just days ago in a tweet by Nathan and is a reworked version of something Colin tweeted about a few weeks ago: a little data wrangling off the kewl rtweet to find maximally spammy accounts per search topic. In other words those who include more than N hashtags for given search term. The other is something I, if memory serves, picked up a while back on one of the lists: a base R function to identify non-ASCII characters in a file. It is a C function that is not directly exported by and hence no accessible, so we put it here (with credits, of course). I mentioned it yesterday when announcing tidyCpp as I this C function was the starting point for the new tidyCpp wrapper around some C API of R functions. The (very short) NEWS entry follows.

Changes in version 0.0.12 (2020-10-14)
  • New functions muteTweets and checkPackageAsciiCode.
  • Updated CI setup.

Courtesy of CRANberries, there is a comparison to the previous release. For questions or comments use the issue tracker off the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

14 October 2020

Dirk Eddelbuettel: tidyCpp 0.0.1: New package

A new package arrived on CRAN a few days ago. It offers a few headers files which wrap (parts) of the C API for R, but in a form that may be a little easier to use for C++ programmers. I have always liked how in Rcpp we offer good parts of the standalone R Math library in a namespace R::. While working recently with a particular C routine (for checking non-ASCII characters that will be part of the next version of the dang package which collecting various goodies in one place), I realized there may be value in collecting a few more such wrappers. So I started a few simple ones starting from simple examples. Currently we have five headers defines.h, globals.h, internals.h, math.h, and shield.h. The first four each correpond to an R header file of the same or similar name, and the last one brings a simple yet effective alternative to PROTECT and UNPROTECT from Rcpp (in a slightly simplified way). None of the headers are complete , for internals.h in particular a lot more could be added (as I noticed today when experimenting with another source file that may be converted). All of the headers can be accessed with a simple #include <tidyCpp> (which, following another C++ convention, does not have a .h or .hpp suffix). And a the package ships these headers, packages desiring to use them only need LinkingTo: tidyCpp. As usage examples, we (right now) have four files in the snippets/ directory of the package. Two of these, convolveExample.cpp and dimnamesExample.cpp both illustrate how one could change example code from Writing R Extensions. Then there are also a very simple defineExample.cpp and a shieldExample.cpp illustrating how much easier Shield() is compared to PROTECT and UNPROTECT. Finally, there is a nice vignette discussing the package motivation with two detailed side-by-side before and after examples that are the aforementioned convolution and dimnames examples. Over time, I expect to add more definitions and wrappers. Feedback would be welcome it seems to hit a nerve already as it currently has more stars than commits even though (prior to this post) I had yet to tweet or blog about it. Please post comments and suggestions at the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

28 September 2020

Vincent Bernat: Syncing RIPE, ARIN and APNIC objects with a custom Ansible module

Internet is split into five regional Internet registry: AFRINIC, ARIN, APNIC, LACNIC and RIPE. Each RIR maintains an Internet Routing Registry. An IRR allows one to publish information about the routing of Internet number resources.1 Operators use this to determine the owner of an IP address and to construct and maintain routing filters. To ensure your routes are widely accepted, it is important to keep the prefixes you announce up-to-date in an IRR. There are two common tools to query this database: whois and bgpq4. The first one allows you to do a query with the WHOIS protocol:
$ whois -BrG 2a0a:e805:400::/40
[ ]
inet6num:       2a0a:e805:400::/40
netname:        FR-BLADE-CUSTOMERS-DE
country:        DE
geoloc:         50.1109 8.6821
admin-c:        BN2763-RIPE
tech-c:         BN2763-RIPE
status:         ASSIGNED
mnt-by:         fr-blade-1-mnt
remarks:        synced with cmdb
created:        2020-05-19T08:04:58Z
last-modified:  2020-05-19T08:04:58Z
source:         RIPE
route6:         2a0a:e805:400::/40
descr:          Blade IPv6 - AMS1
origin:         AS64476
mnt-by:         fr-blade-1-mnt
remarks:        synced with cmdb
created:        2019-10-01T08:19:34Z
last-modified:  2020-05-19T08:05:00Z
source:         RIPE
The second one allows you to build route filters using the information contained in the IRR database:
$ bgpq4 -6 -S RIPE -b AS64476
NN = [
    2a0a:e805::/40,
    2a0a:e805:100::/40,
    2a0a:e805:300::/40,
    2a0a:e805:400::/40,
    2a0a:e805:500::/40
];
There is no module available on Ansible Galaxy to manage these objects. Each IRR has different ways of being updated. Some RIRs propose an API but some don t. If we restrict ourselves to RIPE, ARIN and APNIC, the only common method to update objects is email updates, authenticated with a password or a GPG signature.2 Let s write a custom Ansible module for this purpose!

Notice I recommend that you read Writing a custom Ansible module as an introduction, as well as Syncing MySQL tables for a more instructive example.

Code The module takes a list of RPSL objects to synchronize and returns the body of an email update if a change is needed:
- name: prepare RIPE objects
  irr_sync:
    irr: RIPE
    mntner: fr-blade-1-mnt
    source: whois-ripe.txt
  register: irr

Prerequisites The source file should be a set of objects to sync using the RPSL language. This would be the same content you would send manually by email. All objects should be managed by the same maintainer, which is also provided as a parameter. Signing and sending the result is not the responsibility of this module. You need two additional tasks for this purpose:
- name: sign RIPE objects
  shell:
    cmd: gpg --batch --user noc@example.com --clearsign
    stdin: "  irr.objects  "
  register: signed
  check_mode: false
  changed_when: false
- name: update RIPE objects by email
  mail:
    subject: "NEW: update for RIPE"
    from: noc@example.com
    to: "auto-dbm@ripe.net"
    cc: noc@example.com
    host: smtp.example.com
    port: 25
    charset: us-ascii
    body: "  signed.stdout  "
You also need to authorize the PGP keys used to sign the updates by creating a key-cert object and adding it as a valid authentication method for the corresponding mntner object:
key-cert:  PGPKEY-A791AAAB
certif:    -----BEGIN PGP PUBLIC KEY BLOCK-----
certif:    
certif:    mQGNBF8TLY8BDADEwP3a6/vRhEERBIaPUAFnr23zKCNt5YhWRZyt50mKq1RmQBBY
[ ]
certif:    -----END PGP PUBLIC KEY BLOCK-----
mnt-by:    fr-blade-1-mnt
source:    RIPE
mntner:    fr-blade-1-mnt
[ ]
auth:      PGPKEY-A791AAAB
mnt-by:    fr-blade-1-mnt
source:    RIPE

Module definition Starting from the skeleton described in the previous article, we define the module:
module_args = dict(
    irr=dict(type='str', required=True),
    mntner=dict(type='str', required=True),
    source=dict(type='path', required=True),
)
result = dict(
    changed=False,
)
module = AnsibleModule(
    argument_spec=module_args,
    supports_check_mode=True
)

Getting existing objects To grab existing objects, we use the whois command to retrieve all the objects from the provided maintainer.
# Per-IRR variations:
# - whois server
whois =  
    'ARIN': 'rr.arin.net',
    'RIPE': 'whois.ripe.net',
    'APNIC': 'whois.apnic.net'
 
# - whois options
options =  
    'ARIN': ['-r'],
    'RIPE': ['-BrG'],
    'APNIC': ['-BrG']
 
# - objects excluded from synchronization
excluded = ["domain"]
if irr == "ARIN":
    # ARIN does not return these objects
    excluded.extend([
        "key-cert",
        "mntner",
    ])
# Grab existing objects
args = ["-h", whois[irr],
        "-s", irr,
        *options[irr],
        "-i", "mnt-by",
        module.params['mntner']]
proc = subprocess.run("whois", *args, capture_output=True)
if proc.returncode != 0:
    raise AnsibleError(
        f"unable to query whois:  args ")
output = proc.stdout.decode('ascii')
got = extract(output, excluded)
The first part of the code setup some IRR-specific constants: the server to query, the options to provide to the whois command and the objects to exclude from synchronization. The second part invokes the whois command, requesting all objects whose mnt-by field is the provided maintainer. Here is an example of output:
$ whois -h whois.ripe.net -s RIPE -BrG -i mnt-by fr-blade-1-mnt
[ ]
inet6num:       2a0a:e805:300::/40
netname:        FR-BLADE-CUSTOMERS-FR
country:        FR
geoloc:         48.8566 2.3522
admin-c:        BN2763-RIPE
tech-c:         BN2763-RIPE
status:         ASSIGNED
mnt-by:         fr-blade-1-mnt
remarks:        synced with cmdb
created:        2020-05-19T08:04:59Z
last-modified:  2020-05-19T08:04:59Z
source:         RIPE
[ ]
route6:         2a0a:e805:300::/40
descr:          Blade IPv6 - PA1
origin:         AS64476
mnt-by:         fr-blade-1-mnt
remarks:        synced with cmdb
created:        2019-10-01T08:19:34Z
last-modified:  2020-05-19T08:05:00Z
source:         RIPE
[ ]
The result is passed to the extract() function. It parses and normalizes the results into a dictionary mapping object names to objects. We store the result in the got variable.
def extract(raw, excluded):
    """Extract objects."""
    # First step, remove comments and unwanted lines
    objects = "\n".join([obj
                         for obj in raw.split("\n")
                         if not obj.startswith((
                                 "#",
                                 "%",
                         ))])
    # Second step, split objects
    objects = [RPSLObject(obj.strip())
               for obj in re.split(r"\n\n+", objects)
               if obj.strip()
               and not obj.startswith(
                   tuple(f" x :" for x in excluded))]
    # Last step, put objects in a dict
    objects =  repr(obj): obj
               for obj in objects 
    return objects
RPSLObject() is a class enabling normalization and comparison of objects. Look at the module code for more details.
>>> output="""
... inet6num:       2a0a:e805:300::/40
... [ ]
... """
>>> pprint( k: str(v) for k,v in extract(output, excluded=[]) )
 '<Object:inet6num:2a0a:e805:300::/40>':
   'inet6num:       2a0a:e805:300::/40\n'
   'netname:        FR-BLADE-CUSTOMERS-FR\n'
   'country:        FR\n'
   'geoloc:         48.8566 2.3522\n'
   'admin-c:        BN2763-RIPE\n'
   'tech-c:         BN2763-RIPE\n'
   'status:         ASSIGNED\n'
   'mnt-by:         fr-blade-1-mnt\n'
   'remarks:        synced with cmdb\n'
   'source:         RIPE',
 '<Object:route6:2a0a:e805:300::/40>':
   'route6:         2a0a:e805:300::/40\n'
   'descr:          Blade IPv6 - PA1\n'
   'origin:         AS64476\n'
   'mnt-by:         fr-blade-1-mnt\n'
   'remarks:        synced with cmdb\n'
   'source:         RIPE' 

Comparing with wanted objects Let s build the wanted dictionary using the same structure, thanks to the extract() function we can use verbatim:
with open(module.params['source']) as f:
    source = f.read()
wanted = extract(source, excluded)
The next step is to compare got and wanted to build the diff object:
if got != wanted:
    result['changed'] = True
    if module._diff:
        result['diff'] = [
            dict(before_header=k,
                 after_header=k,
                 before=str(got.get(k, "")),
                 after=str(wanted.get(k, "")))
            for k in set((*wanted.keys(), *got.keys()))
            if k not in wanted or k not in got or wanted[k] != got[k]]

Returning updates The module does not have a side effect. If there is a difference, we return the updates to send by email. We choose to include all wanted objects in the updates (contained in the source variable) and let the IRR ignore unmodified objects. We also append the objects to be deleted by adding a delete: attribute to each them them.
# We send all source objects and deleted objects.
deleted_mark = f" 'delete:':16 deleted by CMDB"
deleted = "\n\n".join([f" got[k].raw \n deleted_mark "
                       for k in got
                       if k not in wanted])
result['objects'] = f" source \n\n deleted "
module.exit_json(**result)

The complete code is available on GitHub. The module supports both --diff and --check flags. It does not return anything if no change is detected. It can work with APNIC, RIPE and ARIN. It is not perfect: it may not detect some changes,3 it is not able to modify objects not owned by the provided maintainer4 and some attributes cannot be modified, requiring to manually delete and recreate the updated object.5 However, this module should automate 95% of your needs.

  1. Other IRRs exist without being attached to a RIR. The most notable one is RADb.
  2. ARIN is phasing out this method in favor of IRR-online. RIPE has an API available, but email updates are still supported and not planned to be deprecated. APNIC plans to expose an API.
  3. For ARIN, we cannot query key-cert and mntner objects and therefore we cannot detect changes in them. It is also not possible to detect changes to the auth mechanisms of a mntner object.
  4. APNIC do not assign top-level objects to the maintainer associated with the owner.
  5. Changing the status of an inetnum object requires deleting and recreating the object.

25 September 2020

Colin Watson: Porting Launchpad to Python 3: progress report

Launchpad still requires Python 2, which in 2020 is a bit of a problem. Unlike a lot of the rest of 2020, though, there s good reason to be optimistic about progress. I ve been porting Python 2 code to Python 3 on and off for a long time, from back when I was on the Ubuntu Foundations team and maintaining things like the Ubiquity installer. When I moved to Launchpad in 2015 it was certainly on my mind that this was a large body of code still stuck on Python 2. One option would have been to just accept that and leave it as it is, maybe doing more backporting work over time as support for Python 2 fades away. I ve long been of the opinion that this would doom Launchpad to being unmaintainable in the long run, and since I genuinely love working on Launchpad - I find it an incredibly rewarding project - this wasn t something I was willing to accept. We re already seeing some of our important dependencies dropping support for Python 2, which is perfectly reasonable on their terms but which is starting to become a genuine obstacle to delivering important features when we need new features from newer versions of those dependencies. It also looks as though it may be difficult for us to run on Ubuntu 20.04 LTS (we re currently on 16.04, with an upgrade to 18.04 in progress) as long as we still require Python 2, since we have some system dependencies that 20.04 no longer provides. And then there are exciting new features like type hints and async/await that we d like to be able to use. However, until last year there were so many blockers that even considering a port was barely conceivable. What changed in 2019 was sorting out a trifecta of core dependencies. We ported our database layer, Storm. We upgraded to modern versions of our Zope Toolkit dependencies (after contributing various fixes upstream, including some substantial changes to Zope s test runner that we d carried as local patches for some years). And we ported our Bazaar code hosting infrastructure to Breezy. With all that in place, a port seemed more of a realistic possibility. Still, even with this, it was never going to be a matter of just following some standard porting advice and calling it good. Launchpad has almost a million lines of Python code in its main git tree, and around 250 dependencies of which a number are quite Launchpad-specific. In a project that size, not only is following standard porting advice an extremely time-consuming task in its own right, but just about every strange corner case is going to show up somewhere. (Did you know that StringIO.StringIO(None) and io.StringIO(None) do different things even after you account for the native string vs. Unicode text difference? How about the behaviour of .union() on a subclass of frozenset?) Launchpad s test suite is fortunately extremely thorough, but even just starting up the test suite involves importing most of the data model code, so before you can start taking advantage of it you have to make a large fraction of the codebase be at least syntactically-correct Python 3 code and use only modules that exist in Python 3 while still working in Python 2; in a project this size that turns out to be a large effort on its own, and can be quite risky in places. Canonical s product engineering teams work on a six-month cycle, but it just isn t possible to cram this sort of thing into six months unless you do literally nothing else, and please can we put all feature development on hold while we run to stand still is a pretty tough sell to even the most understanding management. Fortunately, we ve been able to grow the Launchpad team in the last year or so, and so it s been possible to put Python 3 on our roadmap on the understanding that we aren t going to get all the way there in one cycle, while still being able to do other substantial feature development work as well. So, with all that preamble, what have we done this cycle? We ve taken a two-pronged approach. From one end, we identified 147 classes that needed to be ported away from some compatibility code in our database layer that was substantially less friendly to Python 3: we ve ported 38 of those, so there s clearly a fair bit more to do, but we were able to distribute this work out among the team quite effectively. From the other end, it was clear that it would be very inefficient to do general porting work when any attempt to even run the test suite would run straight into the same crashes in the same order, so I set myself a target of getting the test suite to start up, and started hacking on an enormous git branch that I never expected to try to land directly: instead, I felt free to commit just about anything that looked reasonable and moved things forward even if it was very rough, and every so often went back to tidy things up and cherry-pick individual commits into a form that included some kind of explanation and passed existing tests so that I could propose them for review. This strategy has been dramatically more successful than anything I ve tried before at this scale. So far this cycle, considering only Launchpad s main git tree, we ve landed 137 Python-3-relevant merge proposals for a total of 39552 lines of git diff output, keeping our existing tests passing along the way and deploying incrementally to production. We have about 27000 more lines of patch at varying degrees of quality to tidy up and merge. Our main development branch is only perhaps 10 or 20 more patches away from the test suite being able to start up, at which point we ll be able to get a buildbot running so that multiple developers can work on this much more easily and see the effect of their work. With the full unlanded patch stack, about 75% of the test suite passes on Python 3! This still leaves a long tail of several thousand tests to figure out and fix, but it s a much more incrementally-tractable kind of problem than where we started. Finally: the funniest (to me) bug I ve encountered in this effort was the one I encountered in the test runner and fixed in zopefoundation/zope.testrunner#106: IDs of failing tests were written to a pipe, so if you have a test suite that s large enough and broken enough then eventually that pipe would reach its capacity and your test runner would just give up and hang. Pretty annoying when it meant an overnight test run didn t give useful results, but also eloquent commentary of sorts.

16 September 2020

Steve Kemp: Implementing a FORTH-like language ..

Four years ago somebody posted a comment-thread describing how you could start writing a little reverse-polish calculator, in C, and slowly improve it until you had written a minimal FORTH-like system: At the time I read that comment I'd just hacked up a simple FORTH REPL of my own, in Perl, and I said "thanks for posting". I was recently reminded of this discussion, and decided to work through the process. Using only minimal outside resources the recipe worked as expected! The end-result is I have a working FORTH-lite, or FORTH-like, interpreter written in around 2000 lines of golang! Features include: To give a flavour here we define a word called star which just outputs a single start-character:
: star 42 emit ;
Now we can call that (NOTE: We didn't add a newline here, so the REPL prompt follows it, that's expected):
> star
*>
To make it more useful we define the word "stars" which shows N stars:
> : stars dup 0 > if 0 do star loop else drop then ;
> 0 stars
> 1 stars
*> 2 stars
**> 10 stars
**********>
This example uses both if to test that the parameter on the stack was greater than zero, as well as do/loop to handle the repetition. Finally we use that to draw a box:
> : squares 0 do over stars cr loop ;
> 4 squares
****
****
****
****
> 10 squares
**********
**********
**********
**********
**********
**********
**********
**********
**********
**********
For fun we allow decompiling the words too:
> #words 0 do dup dump loop
..
Word 'square'
 0: dup
 1: *
Word 'cube'
 0: dup
 1: square
 2: *
Word '1+'
 0: store 1.000000
 2: +
Word 'test_hot'
  0: store 0.000000
  2: >
  3: if
  4: [cond-jmp 7.000000]
  6: hot
  7: then
..
Anyway if that is at all interesting feel free to take a peak. There's a bit of hackery there to avoid the use of return-stacks, etc. Compared to gforth this is actually more featureful in some areas: Find the code here:

Next.

Previous.